[Dock-fans] failure on dock.mpi ==> NEWS!

Alessandro Nascimento al.s.nascimento at gmail.com
Tue Jul 10 20:02:21 PDT 2007


Hi Scott,


sorry if I was not clear.

My guess is that there is something with secondary score  in my docking.
The input file I attached to this mail runs well in parallel on my cluster.
However if I enable grid_score_secundary, I got the error with mpi.
Also, the same happens with continuous_score_secondary.
I didn't compare the results of parallel and serial jobs yet. I will
take a look at them...
I can put my receptor file and grid files in a ftp/http server if you
want to reproduce exactly what I am doing.

Best Regards,


-alessandro



On 7/10/07, Scott Brozell <sbrozell at scripps.edu> wrote:
> Hi,
>
> On Tue, 10 Jul 2007, Alessandro Nascimento wrote:
>
> > my files are distributed over RAID... however, I tried to reboot my
> > system and restart again.
> > The same error occurs.
> >
> > However, I noticed that if change the line in my input dock file:
> >
> > grid_score_secondary       yes ==> no
> >
> > My docking runs well unitl the end in parallel.
>
> What exactly does that mean ?
> If you make this change
> > grid_score_secondary       yes ==> no
> then with everything else the same dock.mpi completes successfully ?
> How does that output compare with the serial output ?
> Please send me your complete dock input file;
> I'll try to reproduce the failure.
>
> Scott
>
> > So, I might be experiencing some problem with this function when I run
> > dock in parallel.
> > Does this information helps you to figure out what might me happening?
> >
> > Best regards,
> >
> >
> > -alessandro
> >
> >
> > On 7/10/07, Scott Brozell <sbrozell at scripps.edu> wrote:
> > > Hi,
> > >
> > > On Tue, 10 Jul 2007, Alessandro Nascimento wrote:
> > >
> > > > using the same mpich implementation I compiled pmemd (software of
> > > > particle mesh ewald implementation in amber 9). It is running okay in
> > > > my cluster now on the 5 nodes (6 processors).
> > > > However, neither dock parallel tests or my own dock run there....
> > > > Just to mention, my machines are Intel Xeon 64 (3.4Ghz), in a Gigabit
> > > > network (I don't know if hardware issue matter in this kind of
> > > > problems).
> > > > As was mentioned, it may be a mpi problem rather than dock problems. I
> > > > just would like to know if has anyone else seen the same problem....
> > > >
> > > > Initializing MPI Routines...
> > > > Initializing MPI Routines...
> > > > Initializing MPI Routines...
> > > > Initializing MPI Routines...
> > > > tail -f rigid.out
> > > > cluster_rmsd_threshold                                       2.0
> > > > num_clusterheads_for_rescore                                 5
> > > > num_secondary_scored_conformers_written                      4
> > > > rank_primary_ligands                                         no
> > > > rank_secondary_ligands                                       no
> > > >
> > > > Initializing Library File Routines...
> > > > Initializing Orienting Routines...
> > > > Initializing Grid Score Routines...
> > > >  Reading the energy grid from grid.nrg
> > > > [cli_1]: aborting job:
> > >
> > > One possibility is that your file system is not distributed
> > > across nodes; thus some processes may not be able to read
> > > the grid file.  Have you done a distributed copy of all of the
> > > input files ?
> > >
> > > Have you examined the per process dock output files, ie,
> > > rigid.out.1, rigid.out.2, ...
> > > There may be more specific error messages in them.
> > >
> > > Scott
> > >
> > > > Fatal error in MPI_Recv: Other MPI error, error stack:
> > > > MPI_Recv(186).............................:
> > > > MPI_Recv(buf=0x7fffffc48f1c, count=2, MPI_INT, src=0, tag=100,
> > > > MPI_COMM_WORLD, status=0x7fffffc46620) failed
> > > > MPIDI_CH3_Progress_wait(212)..............: an error occurred while
> > > > handling an event returned by MPIDU_Sock_Wait()
> > > > MPIDI_CH3I_Progress_handle_sock_event(413):
> > > > MPIDU_Socki_handle_read(633)..............: connection failure
> > > > (set=0,sock=1,errno=104:Connection reset by peer)
> > > > rank 0 in job 6  ruska_34405   caused collective abort of all ranks
> > > >   exit status of rank 0: killed by signal 11
> > > >
> > > > -----------------------------------
> > > > Molecule: ZINC00000012
> > > >
> > > >  Anchors:               1
> > > >  Orientations:          120000
> > > >  Conformations:         2375
> > > >
> > > >  Primary Score
> > > > [2]   Exit 11                 /home/apps/mpich2/bin/mpiexec -n 5
> > > > /home/apps/dock/dock6/bin/dock6.mpi -i rigid.in -o rigid.out
> > > > </dev/null
>


-- 
[ ]s

--alessandro
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rigid.in
Type: application/octet-stream
Size: 4163 bytes
Desc: not available
Url : http://blur.compbio.ucsf.edu/pipermail/dock-fans/attachments/20070711/44d5e571/attachment.obj 


More information about the Dock-fans mailing list