I followed your direction and it works fine now. Thank you very much. Appreciate it.
Prakashan i01:~ {1005}$ qconf -sp orte pe_name orte slots 360 user_lists NONE xuser_lists NONE start_proc_args /bin/true stop_proc_args /bin/true allocation_rule $round_robin control_slaves TRUE job_is_first_task FALSE urgency_slots min -----Original Message----- From: users-boun...@open-mpi.org on behalf of Pak Lui Sent: Thu 4/3/2008 1:37 PM To: Open MPI Users Subject: Re: [OMPI users] SGE error: executing task of job 22966 failed: Hi Prakashan, I believe it might be something from PE setting. Could you try this: Change this parameter in the 'orte' parallel environment from: > job_is_first_task TRUE to: > job_is_first_task FALSE If you have this set to true, it would take away an available slot in your job, so it might prevent an SGE 'task' from launching to one of your SGE nodes. Korambath, Prakashan wrote: > Hi, > > I just compiled OpenMPI version 1.2.5 with the option > > > ./configure --prefix=/u/local/mpi/openmpi/1.2.5 > --with-openib=/usr/local --enable-static --disable-shared CC=icc > CXX=icpc F77=ifort FC=ifort --with-sge > > on a X86_64 machine with Infiniband Interconnect and OFED software and > CentOS 5 OS > > Everything works fine on command line job submission, but when I submit > through SGE 6.1U3 I am getting following error > > error: executing task of job 23081 failed: > [n99:01442] ERROR: A daemon on node n99 failed to start as expected. > [n99:01442] ERROR: There may be more information available from > [n99:01442] ERROR: the 'qstat -t' command on the Grid Engine tasks. > [n99:01442] ERROR: If the problem persists, please restart the > [n99:01442] ERROR: Grid Engine PE job > [n99:01442] ERROR: The daemon exited unexpectedly with status 1. > > > In my command script for SGE I have > #$ -pe orte 2 > > > /u/local/mpi/openmpi/1.2.5/bin/mpiexec -n 2 -machinefile $TMPDIR/nodefile \ > /u/home2/ppk/MPI/C/executablename >& output > > > > n99:/work/23081.1.campus.q {1002}$ cat nodefile > n99 slots=1 > n15 slots=1 > > > n99:/work/23081.1.campus.q {1003}$ qconf -sp orte > pe_name orte > slots 360 > user_lists NONE > xuser_lists NONE > start_proc_args /bin/true > stop_proc_args /bin/true > allocation_rule $round_robin > control_slaves TRUE > job_is_first_task TRUE > urgency_slots min > > > I am combing through the archives to look for similar errors. I have > seen some of it, but no satisfactory answer. Anyone knows why? > > > > i02:/u/local/mpi/openmpi/1.2.5/bin {1049}$ ./ompi_info | grep tm > MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.5) > > I also tried pre-relese 1.2.6rc3 same results. > > > Prakashan > > > > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- - Pak Lui pak....@sun.com _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users