[Dock-fans] failure on dock.mpi

Scott Brozell sbrozell at scripps.edu
Tue Jul 10 13:23:35 PDT 2007


Hi,

On Tue, 10 Jul 2007, Alessandro Nascimento wrote:

> using the same mpich implementation I compiled pmemd (software of
> particle mesh ewald implementation in amber 9). It is running okay in
> my cluster now on the 5 nodes (6 processors).
> However, neither dock parallel tests or my own dock run there....
> Just to mention, my machines are Intel Xeon 64 (3.4Ghz), in a Gigabit
> network (I don't know if hardware issue matter in this kind of
> problems).
> As was mentioned, it may be a mpi problem rather than dock problems. I
> just would like to know if has anyone else seen the same problem....
>
> Initializing MPI Routines...
> Initializing MPI Routines...
> Initializing MPI Routines...
> Initializing MPI Routines...
> tail -f rigid.out
> cluster_rmsd_threshold                                       2.0
> num_clusterheads_for_rescore                                 5
> num_secondary_scored_conformers_written                      4
> rank_primary_ligands                                         no
> rank_secondary_ligands                                       no
>
> Initializing Library File Routines...
> Initializing Orienting Routines...
> Initializing Grid Score Routines...
>  Reading the energy grid from grid.nrg
> [cli_1]: aborting job:

One possibility is that your file system is not distributed
across nodes; thus some processes may not be able to read
the grid file.  Have you done a distributed copy of all of the
input files ?

Have you examined the per process dock output files, ie,
rigid.out.1, rigid.out.2, ...
There may be more specific error messages in them.

Scott

> Fatal error in MPI_Recv: Other MPI error, error stack:
> MPI_Recv(186).............................:
> MPI_Recv(buf=0x7fffffc48f1c, count=2, MPI_INT, src=0, tag=100,
> MPI_COMM_WORLD, status=0x7fffffc46620) failed
> MPIDI_CH3_Progress_wait(212)..............: an error occurred while
> handling an event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(413):
> MPIDU_Socki_handle_read(633)..............: connection failure
> (set=0,sock=1,errno=104:Connection reset by peer)
> rank 0 in job 6  ruska_34405   caused collective abort of all ranks
>   exit status of rank 0: killed by signal 11
>
> -----------------------------------
> Molecule: ZINC00000012
>
>  Anchors:               1
>  Orientations:          120000
>  Conformations:         2375
>
>  Primary Score
> [2]   Exit 11                 /home/apps/mpich2/bin/mpiexec -n 5
> /home/apps/dock/dock6/bin/dock6.mpi -i rigid.in -o rigid.out
> </dev/null


More information about the Dock-fans mailing list