How did you configure OMPI? Did you inlaced --with-tm so that the native Torque 
support was built? If you do, then you shouldn't need to add a -machinefile to 
your cmd line as we'll automatically pickup the allocation.

If you run your second way:

> /appl/mpi/openmpi/1.4.4/bin/mpirun -n $NUMPROCS -machinefile ./hosts_openmpi 
> /appl/mpi/openmpi/1.4.4/bin/mpitests-IMB-MPI1 runs without problem.

then mpirun automatically assigns the required paths because you used an 
absolute path to mpirun. However, this only occurs if you are using the rsh 
launcher instead of the Torque one, so I suspect you forgot to include the 
native Torque support.

The problem is that without the native support, Torque doesn't know about the 
orteds (as they are launched via rsh instead of Torque), and so Torque can't 
forward the environment like it is supposed to do.


On Mar 26, 2012, at 2:08 AM, giggzounet wrote:

> Hi,
> 
> My problem:
> On our cluster, openmpi 1.4.4 is installed. We are using the module 
> environment so I have created a module file to set up openmpi:
> prepend-path PATH /appl/mpi/openmpi/1.4.4/bin
> prepend-path LD_LIBRARY_PATH /appl/mpi/openmpi/1.4.4/lib
> prepend-path MANPATH /appl/mpi/openmpi/1.4.4/share/man
> setenv                  MPI_BIN         /appl/mpi/openmpi/1.4.4/bin
> setenv                  MPI_SYSCONFIG   /appl/mpi/openmpi/1.4.4/etc
> setenv                  MPI_INCLUDE     /appl/mpi/openmpi/1.4.4/include
> setenv                  MPI_LIB         /appl/mpi/openmpi/1.4.4/lib
> setenv                  MPI_MAN         /appl/mpi/openmpi/1.4.4/share/man
> setenv                  MPI_COMPILER    openmpi-x86_64
> setenv                  MPI_SUFFIX      _openmpi
> setenv                  MPI_HOME        /appl/mpi/openmpi/1.4.4
> 
> This openmpi module loads without problem and mpirun, orted...are in the PATH.
> Now I want to start a pbs job:
> #!/bin/bash
> #PBS -N mpi-test
> #PBS -j oe
> #PBS -m abe
> #PBS -l nodes=2:ppn=2
> #PBS -l walltime=2:00:00
> #PBS -q long
> module list
> module unload mpi/intel-mpi/2012
> module load mpi/openmpi/1.4.4
> module list
> cd $PBS_O_WORKDIR
> cat $PBS_NODEFILE > hosts_openmpi
> mpirun -n $NUMPROCS -machinefile ./hosts_openmpi mpitests-IMB-MPI1
> 
> 
> And I get:
> bash: orted: command not found
> --------------------------------------------------------------------------
> A daemon (pid 7399) died unexpectedly with status 127 while attempting
> to launch so we are aborting.
> 
> There may be more information reported by the environment (see above).
> 
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
> 
> 
> 
> It is very strange.../appl/mpi/openmpi/1.4.4/bin/ is in the PATH IN the pbs 
> environment (I check that with env in a pbs job). But it doesn't work...
> 
> /appl/mpi/openmpi/1.4.4/bin/mpirun -n $NUMPROCS -machinefile ./hosts_openmpi 
> /appl/mpi/openmpi/1.4.4/bin/mpitests-IMB-MPI1 runs without problem.
> 
> So I don't understand where I did an error...If someone could help me...
> Thx a lot,
> Best regards,
> Guillaume
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to