Open MPI and SLURM should work together just fine right out-of-the-box. The
typical command progression is:

srun -n x -A
mpirun -n y .....


If you are doing those commands and still see everything running on the head
node, then two things could be happening:

(a) you really aren't getting an allocation from slurm. Perhaps you don't
have slurm setup correctly and aren't actually seeing the allocation in your
environment. Do a "printenv | grep SLURM" and see if you find the following
variables:
SLURM_NPROCS=8
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=
SLURM_CPU_BIND_LIST=
SLURM_MEM_BIND_VERBOSE=quiet
SLURM_MEM_BIND_TYPE=
SLURM_MEM_BIND_LIST=
SLURM_JOBID=47225
SLURM_NNODES=2
SLURM_NODELIST=odin[013-014]
SLURM_TASKS_PER_NODE=4(x2)
SLURM_SRUN_COMM_PORT=43206
SLURM_SRUN_COMM_HOST=odin

Obviously, the values will be different, but we really need the
TASKS_PER_NODE and NODELIST ones to be there

(b) the master node is being included in your nodelist and you aren't
running enough mpi processes to need more nodes (i.e., the number of slots
on the master node is greater than or equal to the num procs you requested).
You can force Open MPI to not run on your master node by including
"--nolocal" on your command line.

Of course, if the master node is the only thing on the nodelist, this will
cause mpirun to abort as there is nothing else for us to use.

Hope that helps
Ralph


On 1/18/07 11:03 PM, "Robert Bicknell" <robbickn...@gmail.com> wrote:

> I'm trying to get slurm and openmpi to work together on a debian, two
> node cluster.  Slurm and openmpi seem to work fine seperately, but when
> I try to run a mpi program in a slurm allocation, all the processes get
> run on the master node, and not distributed to the second node. What am
> I doing wrong?
> 
> Bob
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to