Hi,

I have been trying to understand how to correctly launch hybrid
MPI/OpenMP (i.e. multi-threaded MPI jobs) with mpirun. I am quite
puzzled as to what is the correct command-line options to use. The
description on mpirun man page is very confusing and I could not get
what I wanted.

A background: The cluster is using SGE, and I am using OpenMPI 1.10.2
compiled with & for gcc 4.9.3. The MPI library was configured with SGE
support. The compute nodes have 32 cores, which are basically 2
sockets of Xeon E5-2698 v3 (16-core Haswell).

A colleague told me the following:

$ export OMP_NUM_THREADS=2
$ mpirun -np 16 -map-by node:PE=2 ./EXECUTABLE

I could see the executable using 200% of CPU per process--that's good.
There is one catch in the general case. "-map-by node" will assign the
MPI processes in a round-robin fashion (so MPI rank 0 gets node 0, mpi
rank 1 gets node 1, and so on until all nodes are given 1 process,
then it will go back to node 0,1, ...).

Instead of the scenario above, I was trying to get the MPI processes
side-by-side (more like "fill_up" policy in SGE scheduler), i.e. fill
node 0 first, then fill node 1, and so on. How do I do this properly?

I tried a few attempts that fail:

$ export OMP_NUM_THREADS=2
$ mpirun -np 16 -map-by core:PE=2 ./EXECUTABLE

or

$ export OMP_NUM_THREADS=2
$ mpirun -np 16 -map-by socket:PE=2 ./EXECUTABLE

Both failed with an error mesage:

--------------------------------------------------------------------------
A request for multiple cpus-per-proc was given, but a directive
was also give to map to an object level that cannot support that
directive.

Please specify a mapping level that has more than one cpu, or
else let us define a default mapping that will allow multiple
cpus-per-proc.
--------------------------------------------------------------------------

Another attempt was:

$ export OMP_NUM_THREADS=2
$ mpirun -np 16 -map-by socket:PE=2 -bind-to socket ./EXECUTABLE

Here's the error message:

--------------------------------------------------------------------------
A request for multiple cpus-per-proc was given, but a conflicting binding
policy was specified:

  #cpus-per-proc:  2
  type of cpus:    cores as cpus
  binding policy given: SOCKET

The correct binding policy for the given type of cpu is:

  correct binding policy:  bind-to core

This is the binding policy we would apply by default for this
situation, so no binding need be specified. Please correct the
situation and try again.
--------------------------------------------------------------------------

Clearly I am not understanding how this map-by works. Could somebody
help me? There was a wiki article partially written:

https://github.com/open-mpi/ompi/wiki/ProcessPlacement

but unfortunately it is also not clear to me.

-- 
Wirawan Purwanto
Computational Scientist, HPC Group
Information Technology Services
Old Dominion University
Norfolk, VA 23529
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to