Hello Open MPI Gurus,

As I explore MPI-OpenMP hybrid codes, I'm trying to figure out how to do
things to get the same behavior in various stacks. For example, I have a
28-core node (2 14-core Haswells), and I'd like to run 4 MPI processes and
7 OpenMP threads. Thus, I'd like the processes to be 2 processes per socket
with the OpenMP threads laid out on them. Using a "hybrid Hello World"
program, I can achieve this with Intel MPI (after a lot of testing):

(1097) $ env OMP_NUM_THREADS=7 KMP_AFFINITY=compact mpirun -np 4
./hello-hybrid.x | sort -g -k 18
srun.slurm: cluster configuration lacks support for cpu binding
Hello from thread 0 out of 7 from process 2 out of 4 on borgo035 on CPU 0
Hello from thread 1 out of 7 from process 2 out of 4 on borgo035 on CPU 1
Hello from thread 2 out of 7 from process 2 out of 4 on borgo035 on CPU 2
Hello from thread 3 out of 7 from process 2 out of 4 on borgo035 on CPU 3
Hello from thread 4 out of 7 from process 2 out of 4 on borgo035 on CPU 4
Hello from thread 5 out of 7 from process 2 out of 4 on borgo035 on CPU 5
Hello from thread 6 out of 7 from process 2 out of 4 on borgo035 on CPU 6
Hello from thread 0 out of 7 from process 3 out of 4 on borgo035 on CPU 7
Hello from thread 1 out of 7 from process 3 out of 4 on borgo035 on CPU 8
Hello from thread 2 out of 7 from process 3 out of 4 on borgo035 on CPU 9
Hello from thread 3 out of 7 from process 3 out of 4 on borgo035 on CPU 10
Hello from thread 4 out of 7 from process 3 out of 4 on borgo035 on CPU 11
Hello from thread 5 out of 7 from process 3 out of 4 on borgo035 on CPU 12
Hello from thread 6 out of 7 from process 3 out of 4 on borgo035 on CPU 13
Hello from thread 0 out of 7 from process 0 out of 4 on borgo035 on CPU 14
Hello from thread 1 out of 7 from process 0 out of 4 on borgo035 on CPU 15
Hello from thread 2 out of 7 from process 0 out of 4 on borgo035 on CPU 16
Hello from thread 3 out of 7 from process 0 out of 4 on borgo035 on CPU 17
Hello from thread 4 out of 7 from process 0 out of 4 on borgo035 on CPU 18
Hello from thread 5 out of 7 from process 0 out of 4 on borgo035 on CPU 19
Hello from thread 6 out of 7 from process 0 out of 4 on borgo035 on CPU 20
Hello from thread 0 out of 7 from process 1 out of 4 on borgo035 on CPU 21
Hello from thread 1 out of 7 from process 1 out of 4 on borgo035 on CPU 22
Hello from thread 2 out of 7 from process 1 out of 4 on borgo035 on CPU 23
Hello from thread 3 out of 7 from process 1 out of 4 on borgo035 on CPU 24
Hello from thread 4 out of 7 from process 1 out of 4 on borgo035 on CPU 25
Hello from thread 5 out of 7 from process 1 out of 4 on borgo035 on CPU 26
Hello from thread 6 out of 7 from process 1 out of 4 on borgo035 on CPU 27

Other than the odd fact that Process #0 seemed to start on Socket #1 (this
might be an artifact of how I'm trying to detect the CPU I'm on), this
looks reasonable. 14 threads on each socket and each process is laying out
its threads in a nice orderly fashion.

I'm trying to figure out how to do this with Open MPI (version 1.10.0) and
apparently I am just not quite good enough to figure it out. The closest
I've gotten is:

(1155) $ env OMP_NUM_THREADS=7 KMP_AFFINITY=compact mpirun -np 4 -map-by
ppr:2:socket ./hello-hybrid.x | sort -g -k 18
Hello from thread 0 out of 7 from process 0 out of 4 on borgo035 on CPU 0
Hello from thread 0 out of 7 from process 1 out of 4 on borgo035 on CPU 0
Hello from thread 1 out of 7 from process 0 out of 4 on borgo035 on CPU 1
Hello from thread 1 out of 7 from process 1 out of 4 on borgo035 on CPU 1
Hello from thread 2 out of 7 from process 0 out of 4 on borgo035 on CPU 2
Hello from thread 2 out of 7 from process 1 out of 4 on borgo035 on CPU 2
Hello from thread 3 out of 7 from process 0 out of 4 on borgo035 on CPU 3
Hello from thread 3 out of 7 from process 1 out of 4 on borgo035 on CPU 3
Hello from thread 4 out of 7 from process 0 out of 4 on borgo035 on CPU 4
Hello from thread 4 out of 7 from process 1 out of 4 on borgo035 on CPU 4
Hello from thread 5 out of 7 from process 0 out of 4 on borgo035 on CPU 5
Hello from thread 5 out of 7 from process 1 out of 4 on borgo035 on CPU 5
Hello from thread 6 out of 7 from process 0 out of 4 on borgo035 on CPU 6
Hello from thread 6 out of 7 from process 1 out of 4 on borgo035 on CPU 6
Hello from thread 0 out of 7 from process 2 out of 4 on borgo035 on CPU 14
Hello from thread 0 out of 7 from process 3 out of 4 on borgo035 on CPU 14
Hello from thread 1 out of 7 from process 2 out of 4 on borgo035 on CPU 15
Hello from thread 1 out of 7 from process 3 out of 4 on borgo035 on CPU 15
Hello from thread 2 out of 7 from process 2 out of 4 on borgo035 on CPU 16
Hello from thread 2 out of 7 from process 3 out of 4 on borgo035 on CPU 16
Hello from thread 3 out of 7 from process 2 out of 4 on borgo035 on CPU 17
Hello from thread 3 out of 7 from process 3 out of 4 on borgo035 on CPU 17
Hello from thread 4 out of 7 from process 2 out of 4 on borgo035 on CPU 18
Hello from thread 4 out of 7 from process 3 out of 4 on borgo035 on CPU 18
Hello from thread 5 out of 7 from process 2 out of 4 on borgo035 on CPU 19
Hello from thread 5 out of 7 from process 3 out of 4 on borgo035 on CPU 19
Hello from thread 6 out of 7 from process 2 out of 4 on borgo035 on CPU 20
Hello from thread 6 out of 7 from process 3 out of 4 on borgo035 on CPU 20

Obviously not right. Any ideas on how to help me learn? The man mpirun page
is a bit formidable in the pinning part, so maybe I've missed an obvious
answer.

Matt
-- 
Matt Thompson

Man Among Men
Fulcrum of History

Reply via email to