Hello Open MPI Gurus, As I explore MPI-OpenMP hybrid codes, I'm trying to figure out how to do things to get the same behavior in various stacks. For example, I have a 28-core node (2 14-core Haswells), and I'd like to run 4 MPI processes and 7 OpenMP threads. Thus, I'd like the processes to be 2 processes per socket with the OpenMP threads laid out on them. Using a "hybrid Hello World" program, I can achieve this with Intel MPI (after a lot of testing):
(1097) $ env OMP_NUM_THREADS=7 KMP_AFFINITY=compact mpirun -np 4 ./hello-hybrid.x | sort -g -k 18 srun.slurm: cluster configuration lacks support for cpu binding Hello from thread 0 out of 7 from process 2 out of 4 on borgo035 on CPU 0 Hello from thread 1 out of 7 from process 2 out of 4 on borgo035 on CPU 1 Hello from thread 2 out of 7 from process 2 out of 4 on borgo035 on CPU 2 Hello from thread 3 out of 7 from process 2 out of 4 on borgo035 on CPU 3 Hello from thread 4 out of 7 from process 2 out of 4 on borgo035 on CPU 4 Hello from thread 5 out of 7 from process 2 out of 4 on borgo035 on CPU 5 Hello from thread 6 out of 7 from process 2 out of 4 on borgo035 on CPU 6 Hello from thread 0 out of 7 from process 3 out of 4 on borgo035 on CPU 7 Hello from thread 1 out of 7 from process 3 out of 4 on borgo035 on CPU 8 Hello from thread 2 out of 7 from process 3 out of 4 on borgo035 on CPU 9 Hello from thread 3 out of 7 from process 3 out of 4 on borgo035 on CPU 10 Hello from thread 4 out of 7 from process 3 out of 4 on borgo035 on CPU 11 Hello from thread 5 out of 7 from process 3 out of 4 on borgo035 on CPU 12 Hello from thread 6 out of 7 from process 3 out of 4 on borgo035 on CPU 13 Hello from thread 0 out of 7 from process 0 out of 4 on borgo035 on CPU 14 Hello from thread 1 out of 7 from process 0 out of 4 on borgo035 on CPU 15 Hello from thread 2 out of 7 from process 0 out of 4 on borgo035 on CPU 16 Hello from thread 3 out of 7 from process 0 out of 4 on borgo035 on CPU 17 Hello from thread 4 out of 7 from process 0 out of 4 on borgo035 on CPU 18 Hello from thread 5 out of 7 from process 0 out of 4 on borgo035 on CPU 19 Hello from thread 6 out of 7 from process 0 out of 4 on borgo035 on CPU 20 Hello from thread 0 out of 7 from process 1 out of 4 on borgo035 on CPU 21 Hello from thread 1 out of 7 from process 1 out of 4 on borgo035 on CPU 22 Hello from thread 2 out of 7 from process 1 out of 4 on borgo035 on CPU 23 Hello from thread 3 out of 7 from process 1 out of 4 on borgo035 on CPU 24 Hello from thread 4 out of 7 from process 1 out of 4 on borgo035 on CPU 25 Hello from thread 5 out of 7 from process 1 out of 4 on borgo035 on CPU 26 Hello from thread 6 out of 7 from process 1 out of 4 on borgo035 on CPU 27 Other than the odd fact that Process #0 seemed to start on Socket #1 (this might be an artifact of how I'm trying to detect the CPU I'm on), this looks reasonable. 14 threads on each socket and each process is laying out its threads in a nice orderly fashion. I'm trying to figure out how to do this with Open MPI (version 1.10.0) and apparently I am just not quite good enough to figure it out. The closest I've gotten is: (1155) $ env OMP_NUM_THREADS=7 KMP_AFFINITY=compact mpirun -np 4 -map-by ppr:2:socket ./hello-hybrid.x | sort -g -k 18 Hello from thread 0 out of 7 from process 0 out of 4 on borgo035 on CPU 0 Hello from thread 0 out of 7 from process 1 out of 4 on borgo035 on CPU 0 Hello from thread 1 out of 7 from process 0 out of 4 on borgo035 on CPU 1 Hello from thread 1 out of 7 from process 1 out of 4 on borgo035 on CPU 1 Hello from thread 2 out of 7 from process 0 out of 4 on borgo035 on CPU 2 Hello from thread 2 out of 7 from process 1 out of 4 on borgo035 on CPU 2 Hello from thread 3 out of 7 from process 0 out of 4 on borgo035 on CPU 3 Hello from thread 3 out of 7 from process 1 out of 4 on borgo035 on CPU 3 Hello from thread 4 out of 7 from process 0 out of 4 on borgo035 on CPU 4 Hello from thread 4 out of 7 from process 1 out of 4 on borgo035 on CPU 4 Hello from thread 5 out of 7 from process 0 out of 4 on borgo035 on CPU 5 Hello from thread 5 out of 7 from process 1 out of 4 on borgo035 on CPU 5 Hello from thread 6 out of 7 from process 0 out of 4 on borgo035 on CPU 6 Hello from thread 6 out of 7 from process 1 out of 4 on borgo035 on CPU 6 Hello from thread 0 out of 7 from process 2 out of 4 on borgo035 on CPU 14 Hello from thread 0 out of 7 from process 3 out of 4 on borgo035 on CPU 14 Hello from thread 1 out of 7 from process 2 out of 4 on borgo035 on CPU 15 Hello from thread 1 out of 7 from process 3 out of 4 on borgo035 on CPU 15 Hello from thread 2 out of 7 from process 2 out of 4 on borgo035 on CPU 16 Hello from thread 2 out of 7 from process 3 out of 4 on borgo035 on CPU 16 Hello from thread 3 out of 7 from process 2 out of 4 on borgo035 on CPU 17 Hello from thread 3 out of 7 from process 3 out of 4 on borgo035 on CPU 17 Hello from thread 4 out of 7 from process 2 out of 4 on borgo035 on CPU 18 Hello from thread 4 out of 7 from process 3 out of 4 on borgo035 on CPU 18 Hello from thread 5 out of 7 from process 2 out of 4 on borgo035 on CPU 19 Hello from thread 5 out of 7 from process 3 out of 4 on borgo035 on CPU 19 Hello from thread 6 out of 7 from process 2 out of 4 on borgo035 on CPU 20 Hello from thread 6 out of 7 from process 3 out of 4 on borgo035 on CPU 20 Obviously not right. Any ideas on how to help me learn? The man mpirun page is a bit formidable in the pinning part, so maybe I've missed an obvious answer. Matt -- Matt Thompson Man Among Men Fulcrum of History