Hello, I've compiled MPIBLAST-1.5.0-pio app on Rocks 4.3,Voltaire infiniband based Linux cluster using Open MPI-1.2.8 + intel 10 compilers.
The job is not running. Let me explain the configs: SGE job script: $ cat sge_submit.sh #!/bin/bash #$ -N OMPI-Blast-Job #$ -S /bin/bash #$ -cwd #$ -e err.$JOB_ID.$JOB_NAME #$ -o out.$JOB_ID.$JOB_NAME #$ -pe orte 4 /opt/openmpi_intel/1.2.8/bin/mpirun -np $NSLOTS /opt/apps/mpiblast-150-pio_OMPI/bin/mpiblast -p blastp -d Mtub_CDC1551_.faa -i 586_seq.fasta -o test.out The PE orte is: $ qconf -sp orte pe_name orte slots 999 user_lists NONE xuser_lists NONE start_proc_args /bin/true stop_proc_args /bin/true allocation_rule $fill_up control_slaves FALSE job_is_first_task TRUE urgency_slots min # /opt/openmpi_intel/1.2.8/bin/ompi_info | grep gridengine MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.8) MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.8) The SGE error and output files for the job are as follows: $ cat err.88.OMPI-Blast-Job error: executing task of job 88 failed: [compute-0-1.local:06151] ERROR: A daemon on node compute-0-1.local failed to start as expected. [compute-0-1.local:06151] ERROR: There may be more information available from [compute-0-1.local:06151] ERROR: the 'qstat -t' command on the Grid Engine tasks. [compute-0-1.local:06151] ERROR: If the problem persists, please restart the [compute-0-1.local:06151] ERROR: Grid Engine PE job [compute-0-1.local:06151] ERROR: The daemon exited unexpectedly with status 1. $ cat out.88.OMPI-Blast-Job There is nothing in output file. The qstat shows that job is running at some node. But on that node, there is no mpiblast processes running as seen by top command. The ps command: # ps -ef | grep mpiblast locuz 4018 4017 0 16:25 ? 00:00:00 /opt/openmpi_intel/1.2.8/bin/mpirun -np 4 /opt/apps/mpiblast-150-pio_OMPI/bin/mpiblast -p blastp -d Mtub_CDC1551_.faa -i 586_seq.fasta -o test.out root 4120 4022 0 16:27 pts/0 00:00:00 grep mpiblast shows this. The ibv_rc_pingpong tests work fine. The output of lsmod: # lsmod | grep ib ib_sdp 57788 0 rdma_cm 38292 3 rdma_ucm,rds,ib_sdp ib_addr 11400 1 rdma_cm ib_local_sa 14864 1 rdma_cm ib_mthca 157396 2 ib_ipoib 83928 0 ib_umad 20656 0 ib_ucm 21256 0 ib_uverbs 46896 8 rdma_ucm,ib_ucm ib_cm 42536 3 rdma_cm,ib_ipoib,ib_ucm ib_sa 28512 4 rdma_cm,ib_local_sa,ib_ipoib,ib_cm ib_mad 43432 5 ib_local_sa,ib_mthca,ib_umad,ib_cm,ib_sa ib_core 70544 14 rdma_ucm,rds,ib_sdp,rdma_cm,iw_cm,ib_local_sa,ib_mthca,ib_ipoib,ib_umad,ib_ucm,ib_uverbs,ib_cm,ib_sa,ib_mad ipv6 285089 23 ib_ipoib libata 124585 1 ata_piix scsi_mod 144529 2 libata,sd_mod What might be the problem? We've used Voltaire OFA Roll from rocks - Gridstack. Thanks, Sangamesh