Hi, Am 30.04.2013 um 21:26 schrieb Vladimir Yamshchikov:
> My recent job started normally but after a few hours of running died with the > following message: > > -------------------------------------------------------------------------- > A daemon (pid 19390) died unexpectedly with status 137 while attempting > to launch so we are aborting. I wonder why it rose the failure only after running for hours. As 137 = 128 + 9 it was killed, maybe by the queuing system due to the set time limit? If you check the accouting, what is the output of: $ qacct -j <job_id> -- Reuti > There may be more information reported by the environment (see above). > > This may be because the daemon was unable to find all the needed shared > libraries on the remote node. You may set your LD_LIBRARY_PATH to have the > location of the shared libraries on the remote nodes and this will > automatically be forwarded to the remote nodes. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that the job aborted, but has no info as to the process > that caused that situation. > > The scheduling script is below: > > #$ -S /bin/bash > #$ -cwd > #$ -N SC3blastx_64-96thr > #$ -pe openmpi* 64-96 > #$ -l h_rt=24:00:00,vf=3G > #$ -j y > #$ -M yaxi...@gmail.com > #$ -m eas > # > # Load the appropriate module files > # Should be loaded already > #$ -V > > mpirun -np $NSLOTS blastx -query > $UABGRID_SCRATCH/SC/AdQ30/fasta/SC1-IS4-Ind1-153ngFr1sep1run1R1AdQ30.fasta > -db nr -out > $UABGRID_SCRATCH/SC/blastx/SC/SC1-IS4-Ind1-153ngFr1sep1run1R1AdQ30.out > -evalue 0.001 -max_intron_length 100000 -outfmt 5 -num_alignments 20 > -lcase_masking -num_threads $NSLOTS > > What caused this termination? It does not seem scheduling problem as the > program run several hours with 96 threads. My $LD_LIBRARY_PATH does have > /share/apps/openmpi/1.6.4-gcc/lib entry, so how else should I modify it? > > Vladimir > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users