[Dock-fans] failure on dock.mpi ==> NEWS!

Scott Brozell sbrozell at scripps.edu
Tue Jul 10 19:39:26 PDT 2007


Hi,

On Tue, 10 Jul 2007, Alessandro Nascimento wrote:

> my files are distributed over RAID... however, I tried to reboot my
> system and restart again.
> The same error occurs.
>
> However, I noticed that if change the line in my input dock file:
>
> grid_score_secondary       yes ==> no
>
> My docking runs well unitl the end in parallel.

What exactly does that mean ?
If you make this change
> grid_score_secondary       yes ==> no
then with everything else the same dock.mpi completes successfully ?
How does that output compare with the serial output ?
Please send me your complete dock input file;
I'll try to reproduce the failure.

Scott

> So, I might be experiencing some problem with this function when I run
> dock in parallel.
> Does this information helps you to figure out what might me happening?
>
> Best regards,
>
>
> -alessandro
>
>
> On 7/10/07, Scott Brozell <sbrozell at scripps.edu> wrote:
> > Hi,
> >
> > On Tue, 10 Jul 2007, Alessandro Nascimento wrote:
> >
> > > using the same mpich implementation I compiled pmemd (software of
> > > particle mesh ewald implementation in amber 9). It is running okay in
> > > my cluster now on the 5 nodes (6 processors).
> > > However, neither dock parallel tests or my own dock run there....
> > > Just to mention, my machines are Intel Xeon 64 (3.4Ghz), in a Gigabit
> > > network (I don't know if hardware issue matter in this kind of
> > > problems).
> > > As was mentioned, it may be a mpi problem rather than dock problems. I
> > > just would like to know if has anyone else seen the same problem....
> > >
> > > Initializing MPI Routines...
> > > Initializing MPI Routines...
> > > Initializing MPI Routines...
> > > Initializing MPI Routines...
> > > tail -f rigid.out
> > > cluster_rmsd_threshold                                       2.0
> > > num_clusterheads_for_rescore                                 5
> > > num_secondary_scored_conformers_written                      4
> > > rank_primary_ligands                                         no
> > > rank_secondary_ligands                                       no
> > >
> > > Initializing Library File Routines...
> > > Initializing Orienting Routines...
> > > Initializing Grid Score Routines...
> > >  Reading the energy grid from grid.nrg
> > > [cli_1]: aborting job:
> >
> > One possibility is that your file system is not distributed
> > across nodes; thus some processes may not be able to read
> > the grid file.  Have you done a distributed copy of all of the
> > input files ?
> >
> > Have you examined the per process dock output files, ie,
> > rigid.out.1, rigid.out.2, ...
> > There may be more specific error messages in them.
> >
> > Scott
> >
> > > Fatal error in MPI_Recv: Other MPI error, error stack:
> > > MPI_Recv(186).............................:
> > > MPI_Recv(buf=0x7fffffc48f1c, count=2, MPI_INT, src=0, tag=100,
> > > MPI_COMM_WORLD, status=0x7fffffc46620) failed
> > > MPIDI_CH3_Progress_wait(212)..............: an error occurred while
> > > handling an event returned by MPIDU_Sock_Wait()
> > > MPIDI_CH3I_Progress_handle_sock_event(413):
> > > MPIDU_Socki_handle_read(633)..............: connection failure
> > > (set=0,sock=1,errno=104:Connection reset by peer)
> > > rank 0 in job 6  ruska_34405   caused collective abort of all ranks
> > >   exit status of rank 0: killed by signal 11
> > >
> > > -----------------------------------
> > > Molecule: ZINC00000012
> > >
> > >  Anchors:               1
> > >  Orientations:          120000
> > >  Conformations:         2375
> > >
> > >  Primary Score
> > > [2]   Exit 11                 /home/apps/mpich2/bin/mpiexec -n 5
> > > /home/apps/dock/dock6/bin/dock6.mpi -i rigid.in -o rigid.out
> > > </dev/null


More information about the Dock-fans mailing list