How did you configure OMPI? Did you inlaced --with-tm so that the native Torque support was built? If you do, then you shouldn't need to add a -machinefile to your cmd line as we'll automatically pickup the allocation.
If you run your second way: > /appl/mpi/openmpi/1.4.4/bin/mpirun -n $NUMPROCS -machinefile ./hosts_openmpi > /appl/mpi/openmpi/1.4.4/bin/mpitests-IMB-MPI1 runs without problem. then mpirun automatically assigns the required paths because you used an absolute path to mpirun. However, this only occurs if you are using the rsh launcher instead of the Torque one, so I suspect you forgot to include the native Torque support. The problem is that without the native support, Torque doesn't know about the orteds (as they are launched via rsh instead of Torque), and so Torque can't forward the environment like it is supposed to do. On Mar 26, 2012, at 2:08 AM, giggzounet wrote: > Hi, > > My problem: > On our cluster, openmpi 1.4.4 is installed. We are using the module > environment so I have created a module file to set up openmpi: > prepend-path PATH /appl/mpi/openmpi/1.4.4/bin > prepend-path LD_LIBRARY_PATH /appl/mpi/openmpi/1.4.4/lib > prepend-path MANPATH /appl/mpi/openmpi/1.4.4/share/man > setenv MPI_BIN /appl/mpi/openmpi/1.4.4/bin > setenv MPI_SYSCONFIG /appl/mpi/openmpi/1.4.4/etc > setenv MPI_INCLUDE /appl/mpi/openmpi/1.4.4/include > setenv MPI_LIB /appl/mpi/openmpi/1.4.4/lib > setenv MPI_MAN /appl/mpi/openmpi/1.4.4/share/man > setenv MPI_COMPILER openmpi-x86_64 > setenv MPI_SUFFIX _openmpi > setenv MPI_HOME /appl/mpi/openmpi/1.4.4 > > This openmpi module loads without problem and mpirun, orted...are in the PATH. > Now I want to start a pbs job: > #!/bin/bash > #PBS -N mpi-test > #PBS -j oe > #PBS -m abe > #PBS -l nodes=2:ppn=2 > #PBS -l walltime=2:00:00 > #PBS -q long > module list > module unload mpi/intel-mpi/2012 > module load mpi/openmpi/1.4.4 > module list > cd $PBS_O_WORKDIR > cat $PBS_NODEFILE > hosts_openmpi > mpirun -n $NUMPROCS -machinefile ./hosts_openmpi mpitests-IMB-MPI1 > > > And I get: > bash: orted: command not found > -------------------------------------------------------------------------- > A daemon (pid 7399) died unexpectedly with status 127 while attempting > to launch so we are aborting. > > There may be more information reported by the environment (see above). > > This may be because the daemon was unable to find all the needed shared > libraries on the remote node. You may set your LD_LIBRARY_PATH to have the > location of the shared libraries on the remote nodes and this will > automatically be forwarded to the remote nodes. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that the job aborted, but has no info as to the process > that caused that situation. > -------------------------------------------------------------------------- > mpirun: clean termination accomplished > > > > It is very strange.../appl/mpi/openmpi/1.4.4/bin/ is in the PATH IN the pbs > environment (I check that with env in a pbs job). But it doesn't work... > > /appl/mpi/openmpi/1.4.4/bin/mpirun -n $NUMPROCS -machinefile ./hosts_openmpi > /appl/mpi/openmpi/1.4.4/bin/mpitests-IMB-MPI1 runs without problem. > > So I don't understand where I did an error...If someone could help me... > Thx a lot, > Best regards, > Guillaume > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users