Hello all,

The MPI-Blast-PIO-1.5.0 is installed with Open MPI 1.2.8 + intel 10
compilers on Rocks-4.3 + Voltaire Infiniband + Voltaire Grid stack OFA
roll.

The 8 process parallel job is submitted through SGE:

$ cat sge_submit.sh
#!/bin/bash

#$ -N OMPI-Blast-Job

#$ -S /bin/bash

#$ -cwd

#$ -e err.$JOB_ID.$JOB_NAME

#$ -o out.$JOB_ID.$JOB_NAME

#$ -pe orte 8
export 
LD_LIBRARY_PATH=/opt/openmpi_intel/1.2.8/lib:/opt/intel/cce/10.1.018/lib:/opt/gridengine/lib/lx26-amd64

#$ -V

/opt/openmpi_intel/1.2.8/bin/mpirun -np $NSLOTS
/opt/apps/mpiblast-150-pio_OMPI/bin/mpiblast -p blastp -d
Mtub_CDC1551_.faa -i 586_seq.fasta -o test8.out

Everytime it is failing with folowing error message:

$ cat err.117.OMPI-Blast-Job
[0,1,7][btl_openib_component.c:1371:btl_openib_component_progress]
from compute-0-5.local to: compute-0-11.local error polling HP CQ with
status LOCAL LENGTH ERROR status number 1 for wr_id 11990008 opcode 42
4       0.481518        Bailing out with signal 15
[compute-0-5.local:25702] MPI_ABORT invoked on rank 4 in communicator
MPI_COMM_WORLD with errorcode 0
5       0.487255        Bailing out with signal 15
[compute-0-5.local:25703] MPI_ABORT invoked on rank 5 in communicator
MPI_COMM_WORLD with errorcode 0
6       0.658543        Bailing out with signal 15
[compute-0-5.local:25704] MPI_ABORT invoked on rank 6 in communicator
MPI_COMM_WORLD with errorcode 0
0       0.481974        Bailing out with signal 15
[compute-0-11.local:25698] MPI_ABORT invoked on rank 0 in communicator
MPI_COMM_WORLD with errorcode 0
1       0.660788        Bailing out with signal 15
[compute-0-11.local:25699] MPI_ABORT invoked on rank 1 in communicator
MPI_COMM_WORLD with errorcode 0
2       0.67406 Bailing out with signal 15
[compute-0-11.local:25700] MPI_ABORT invoked on rank 2 in communicator
MPI_COMM_WORLD with errorcode 0
3       0.680739        Bailing out with signal 15
[compute-0-11.local:25701] MPI_ABORT invoked on rank 3 in communicator
MPI_COMM_WORLD with errorcode 0

This happens only with MPIBlast. The parallel gromacs jobs run very well.

Let me know why this error is appering & how to resolve it? Is it due
to Rocks Gridstack OFA roll?

Thanks,
Sangamesh

Reply via email to