Simplest soln: add -bynode to your mpirun cmd line
On Feb 20, 2011, at 10:50 PM, DOHERTY, Greg wrote: > In order to be able to checkpoint openmpi jobs with blcr, we have > configured openmpi as follows > > ./configure --prefix=/data1/packages/openmpi/1.5.1-blcr-without-tm > --disable-openib-connectx-xrc --disable-openib-rdmacm --with-ft=cr > --enable-mpi-threads --enable-ft-thread --with-blcr=/usr > --with-blcr-libdir=/usr/include --without-tm > > When used in conjunction with torque2.5.3, we are able to start the > following job with 8 cores on one node, but if we try to start the same > job with 4 cores on each of two nodes, the job starts 4 cores on the > primary node, but not the remaining 4 cores on the second node. > > $ cat PBStest > #!/bin/sh > #PBS -c enabled > #PBS -l walltime=25:00:00 > #PBS -l nodes=2:ppn=4 > #PBS -m ae > #PBS -M g...@ansto.gov.au > #PBS -N Prob8 > #PBS -r n > #PBS -q blcrq > source /etc/profile.d/00-modules.sh > module load mpi/openmpi_1.5-blcr-without-tm > NN=`cat $PBS_NODEFILE | wc -l` > cd $PBS_O_WORKDIR > cat $PBS_NODEFILE > hostfile > cat $PBS_NODEFILE > pwd > echo "NN = $NN " > date > which mpirun > cd $PBS_O_WORKDIR > mpirun -am ft-enable-cr -machinefile hostfile ex5mpi testData > -------------------------------------------------------------- > The hostfile correctly lists the primary node 4 times, and then the > second node 4 times. > > When openmpi is built --with-tm, which is the default if --without-tm is > not specified, the job correctly starts on the 8 cores spread across the > 4 nodes. > > blcr needs cr_mpirun to start the job without torque support to be able > to checkpoint the mpi job correctly. > > My question is whether it is possible for the script above to be > modified in order to start on multiple nodes if openmpi has been built > with --without-tm and, if so, what needs to be added or deleted from the > script? > I have tried -mca plm ^tm with openmpi built --with-tm which also will > not start the second 4 mpi ranks. > > Any suggestions gratefully accepted. > Greg Doherty > ANSTO > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users