Hi,

My problem:
On our cluster, openmpi 1.4.4 is installed. We are using the module
environment so I have created a module file to set up openmpi:
prepend-path PATH /appl/mpi/openmpi/1.4.4/bin
prepend-path LD_LIBRARY_PATH /appl/mpi/openmpi/1.4.4/lib
prepend-path MANPATH /appl/mpi/openmpi/1.4.4/share/man
setenv                  MPI_BIN         /appl/mpi/openmpi/1.4.4/bin
setenv                  MPI_SYSCONFIG   /appl/mpi/openmpi/1.4.4/etc
setenv                  MPI_INCLUDE     /appl/mpi/openmpi/1.4.4/include
setenv                  MPI_LIB         /appl/mpi/openmpi/1.4.4/lib
setenv                  MPI_MAN         /appl/mpi/openmpi/1.4.4/share/man
setenv                  MPI_COMPILER    openmpi-x86_64
setenv                  MPI_SUFFIX      _openmpi
setenv                  MPI_HOME        /appl/mpi/openmpi/1.4.4

This openmpi module loads without problem and mpirun, orted...are in the
PATH.
Now I want to start a pbs job:
#!/bin/bash
#PBS -N mpi-test
#PBS -j oe
#PBS -m abe
#PBS -l nodes=2:ppn=2
#PBS -l walltime=2:00:00
#PBS -q long
module list
module unload mpi/intel-mpi/2012
module load mpi/openmpi/1.4.4
module list
cd $PBS_O_WORKDIR
cat $PBS_NODEFILE > hosts_openmpi
mpirun -n $NUMPROCS -machinefile ./hosts_openmpi mpitests-IMB-MPI1


And I get:
bash: orted: command not found
--------------------------------------------------------------------------
A daemon (pid 7399) died unexpectedly with status 127 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished



It is very strange.../appl/mpi/openmpi/1.4.4/bin/ is in the PATH IN the pbs
environment (I check that with env in a pbs job). But it doesn't work...

/appl/mpi/openmpi/1.4.4/bin/mpirun -n $NUMPROCS -machinefile
./hosts_openmpi /appl/mpi/openmpi/1.4.4/bin/mpitests-IMB-MPI1 runs without
problem.

So I don't understand where I did an error...If someone could help me...
Thx a lot,
Best regards,
Guillaume

Reply via email to