Hi, 

  I just compiled OpenMPI version 1.2.5 with the option


./configure --prefix=/u/local/mpi/openmpi/1.2.5 --with-openib=/usr/local  
--enable-static --disable-shared CC=icc CXX=icpc F77=ifort FC=ifort --with-sge 

on a X86_64 machine with Infiniband Interconnect and OFED software and CentOS 5 
OS 

Everything works fine on command line job submission, but when I submit through 
SGE 6.1U3 I am getting following error

error: executing task of job 23081 failed: 
[n99:01442] ERROR: A daemon on node n99 failed to start as expected.
[n99:01442] ERROR: There may be more information available from
[n99:01442] ERROR: the 'qstat -t' command on the Grid Engine tasks.
[n99:01442] ERROR: If the problem persists, please restart the
[n99:01442] ERROR: Grid Engine PE job
[n99:01442] ERROR: The daemon exited unexpectedly with status 1.


In my command script for SGE I have
#$ -pe orte 2


/u/local/mpi/openmpi/1.2.5/bin/mpiexec -n 2 -machinefile $TMPDIR/nodefile  \
         /u/home2/ppk/MPI/C/executablename  >& output



n99:/work/23081.1.campus.q {1002}$ cat nodefile 
n99  slots=1
n15  slots=1


n99:/work/23081.1.campus.q {1003}$ qconf -sp orte
pe_name           orte
slots             360
user_lists        NONE
xuser_lists       NONE
start_proc_args   /bin/true
stop_proc_args    /bin/true
allocation_rule   $round_robin
control_slaves    TRUE
job_is_first_task TRUE
urgency_slots     min


I am combing through the archives to look for similar errors.  I have seen some 
of it, but no satisfactory answer. Anyone knows why?



i02:/u/local/mpi/openmpi/1.2.5/bin {1049}$ ./ompi_info | grep tm
              MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.5)

I also tried pre-relese 1.2.6rc3 same results.


Prakashan


  



Reply via email to