Hi.

 I am having some problems in integrating OpenMPI 1.2b2 with SGE.

 I running the DLPOLY3 code made with pathscale 2.5 compiler suite, the OS
is Red Hat EL4, and network is Gigabit.

 When I run interactively (mpirun -np 64 --hostfile ./nodes16_slots4.txt
(...)/DLPOLY.Y, everything goes fine. But when I use SGE I got the following
error:
   Signal:7 info.si_errno:0(Success) si_code:2()
   Failing at addr:0x4a2823
  (...)
 [node023:07187] mca_btl_tcp_frag_send: writev failed with errno=104
 [node067:06766] mca_btl_tcp_frag_send: writev failed with errno=104
 [node023:07185] mca_btl_tcp_frag_send: writev failed with errno=104
 [node067:06764] mca_btl_tcp_frag_send: writev failed with errno=104
I configured de PE as suggest by the list[1], except for the
"allocation_rule" that I changed to "$fill_up" , like O. Letho[2].

 The ompi_info reports the gridengine correctly
                [ocf@master TEST2]$ ompi_info | grep gridengine
                   MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2)
                   MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2)
and the queue has the PE
                   [ocf@master TEST2]$ qconf -sq ocf.q | grep pe_list
                   pe_list               mpich-uni mpich-multi openmp

 Does anyone has/had similar problems with SGE?

 Thanks for your attention.

Marcelo Garcia
[1] http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge
[2] http://staff.csc.fi/~oplehto/openmpi-gridengine/
=========== PE openmp ===========================================
[ocf@master TEST2]$ qconf -sp openmp
pe_name           openmp
slots             300
user_lists        NONE
xuser_lists       NONE
start_proc_args   /bin/true
stop_proc_args    /bin/true
allocation_rule   $fill_up
control_slaves    TRUE
job_is_first_task FALSE
urgency_slots     min
=========== PE openmp ===========================================

=========== submission script  ===========================================
[ocf@master TEST2]$ more test2.sh
#!/bin/bash
#$ -S /bin/bash
#$ -N DLPOLY2
#$ -q ocf.q
#$ -cwd
#$ -o dlpoly.o
#$ -e dlpoly.e
#$ -pe openmp 64
#$ -V

# This does not make difference, Allways aborts.
export PATH=/home/ocf/ompi/bin:${PATH}
export LD_LIBRARY_PATH=/home/ocf/ompi/lib:${LD_LIBRARY_PATH}

DLPOLY_TEST=/home/ocf/SRIFBENCH/DLPOLY3/data/TEST2
MPIRUN=/home/ocf/ompi/bin/mpirun

cd ${DLPOLY_TEST}
${MPIRUN} -np $NSLOTS /home/ocf/SRIFBENCH/DLPOLY3/execute/DLPOLY.Y
=========== submission script  ===========================================

Reply via email to