Arghhhhhh. You're right... thx a lot!
Le 26 mars 2012 15:36, Ralph Castain <r...@open-mpi.org> a écrit : > How did you configure OMPI? Did you inlaced --with-tm so that the native > Torque support was built? If you do, then you shouldn't need to add a > -machinefile to your cmd line as we'll automatically pickup the allocation. > > If you run your second way: > > > /appl/mpi/openmpi/1.4.4/bin/mpirun -n $NUMPROCS -machinefile > ./hosts_openmpi /appl/mpi/openmpi/1.4.4/bin/mpitests-IMB-MPI1 runs without > problem. > > then mpirun automatically assigns the required paths because you used an > absolute path to mpirun. However, this only occurs if you are using the rsh > launcher instead of the Torque one, so I suspect you forgot to include the > native Torque support. > > The problem is that without the native support, Torque doesn't know about > the orteds (as they are launched via rsh instead of Torque), and so Torque > can't forward the environment like it is supposed to do. > > > On Mar 26, 2012, at 2:08 AM, giggzounet wrote: > > > Hi, > > > > My problem: > > On our cluster, openmpi 1.4.4 is installed. We are using the module > environment so I have created a module file to set up openmpi: > > prepend-path PATH /appl/mpi/openmpi/1.4.4/bin > > prepend-path LD_LIBRARY_PATH /appl/mpi/openmpi/1.4.4/lib > > prepend-path MANPATH /appl/mpi/openmpi/1.4.4/share/man > > setenv MPI_BIN /appl/mpi/openmpi/1.4.4/bin > > setenv MPI_SYSCONFIG /appl/mpi/openmpi/1.4.4/etc > > setenv MPI_INCLUDE /appl/mpi/openmpi/1.4.4/include > > setenv MPI_LIB /appl/mpi/openmpi/1.4.4/lib > > setenv MPI_MAN /appl/mpi/openmpi/1.4.4/share/man > > setenv MPI_COMPILER openmpi-x86_64 > > setenv MPI_SUFFIX _openmpi > > setenv MPI_HOME /appl/mpi/openmpi/1.4.4 > > > > This openmpi module loads without problem and mpirun, orted...are in the > PATH. > > Now I want to start a pbs job: > > #!/bin/bash > > #PBS -N mpi-test > > #PBS -j oe > > #PBS -m abe > > #PBS -l nodes=2:ppn=2 > > #PBS -l walltime=2:00:00 > > #PBS -q long > > module list > > module unload mpi/intel-mpi/2012 > > module load mpi/openmpi/1.4.4 > > module list > > cd $PBS_O_WORKDIR > > cat $PBS_NODEFILE > hosts_openmpi > > mpirun -n $NUMPROCS -machinefile ./hosts_openmpi mpitests-IMB-MPI1 > > > > > > And I get: > > bash: orted: command not found > > > -------------------------------------------------------------------------- > > A daemon (pid 7399) died unexpectedly with status 127 while attempting > > to launch so we are aborting. > > > > There may be more information reported by the environment (see above). > > > > This may be because the daemon was unable to find all the needed shared > > libraries on the remote node. You may set your LD_LIBRARY_PATH to have > the > > location of the shared libraries on the remote nodes and this will > > automatically be forwarded to the remote nodes. > > > -------------------------------------------------------------------------- > > > -------------------------------------------------------------------------- > > mpirun noticed that the job aborted, but has no info as to the process > > that caused that situation. > > > -------------------------------------------------------------------------- > > mpirun: clean termination accomplished > > > > > > > > It is very strange.../appl/mpi/openmpi/1.4.4/bin/ is in the PATH IN the > pbs environment (I check that with env in a pbs job). But it doesn't work... > > > > /appl/mpi/openmpi/1.4.4/bin/mpirun -n $NUMPROCS -machinefile > ./hosts_openmpi /appl/mpi/openmpi/1.4.4/bin/mpitests-IMB-MPI1 runs without > problem. > > > > So I don't understand where I did an error...If someone could help me... > > Thx a lot, > > Best regards, > > Guillaume > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >