The most likely problem is that you have a path or library issue regarding the location of the OpenMPI/OpenRTE executables when running batch versus interactive. We see this sometimes when the shell startups differ in those two modes.
You might try just running a batch vs interactive printenv to see if differences exist. As far as I know, there are no compatibility issues with Torque at this time. Ralph On 5/1/07 8:54 AM, "Ole Holm Nielsen" <ole.h.niel...@fysik.dtu.dk> wrote: > We have built OpenMPI 1.2.1 with support for Torque 2.1.8 and its > Task Manager interface. We use the PGI 6.2-4 compiler and the > --with-tm option as described in > http://www.open-mpi.org/faq/?category=building#build-rte-tm > for building an OpenMPI RPM on a Pentium-4 machine running CentOS 4.4 > (RHEL4U4 clone). The TM interface seems to be available as it should: > > # ompi_info | grep tm > MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.1) > MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1) > MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1) > > When we submit a Torque batch job running the example code in > openmpi-1.2.1/examples/hello_c.c we get this error message: > > /usr/local/openmpi-1.2.1-pgi/bin/mpirun -np 2 -machinefile $PBS_NODEFILE > hello_c > [u126.dcsc.fysik.dtu.dk:11981] pls:tm: failed to poll for a spawned proc, > return > status = 17002 > [u126.dcsc.fysik.dtu.dk:11981] [0,0,0] ORTE_ERROR_LOG: In errno in file > rmgr_urm.c at line 462 > [u126.dcsc.fysik.dtu.dk:11981] mpirun: spawn failed with errno=-11 > > When we run the same code in an interactive (non-Torque) shell the > hello_c code works correctly: > > # /usr/local/openmpi-1.2.1-pgi/bin/mpirun -np 2 -machinefile hostfile hello_c > Hello, world, I am 0 of 2 > Hello, world, I am 1 of 2 > > To prove that the Torque TM interface is working correctly we also make this > test within the Torque batch job using the Torque pbsdsh command: > > pbsdsh hostname > u126.dcsc.fysik.dtu.dk > u113.dcsc.fysik.dtu.dk > > So obviously something is broken between Torque 2.1.8 and OpenMPI 1.2.1 > with respect to the TM interface, whereas either one alone seems to work > correctly. Can anyone suggest a solution to this problem ? > > I wonder if this problem may be related to this list thread: > http://www.open-mpi.org/community/lists/users/2007/04/3028.php > > Details of configuration: > ------------------------- > > We use the buildrpm.sh script from > http://www.open-mpi.org/software/ompi/v1.2/srpm.php > and change the following options in the script: > > prefix="/usr/local/openmpi-1.2.1-pgi" > > configure_options="--with-tm=/usr/local FC=pgf90 F77=pgf90 CC=pgcc CXX=pgCC > CFLAGS=-Msignextend CXXFLAGS=-Msignextend --with-wrapper-cflags=-Msignextend > --with-wrapper-cxxflags=-Msignextend FFLAGS > =-Msignextend FCFLAGS=-Msignextend --with-wrapper-fflags=-Msignextend > --with-wrapper-fcflags=-Msignextend" > rpmbuild_options=${rpmbuild_options}" --define 'install_in_opt 0' --define > 'install_shell_scripts 1' --define 'install_modulefile 0'" > rpmbuild_options=${rpmbuild_options}" --define '_prefix ${prefix}'" > > build_single=yes