I do not think KMP_AFFINITY should affect anything in OpenMPI, it is an MKL env setting? Or am I wrong?
Note that these are used in an environment where openmpi automatically gets the host-file. Hence they are not present. With intel mkl and openmpi I got the best performance using these, rather long flags: export KMP_AFFINITY=verbose,compact,granularity=core export KMP_STACKSIZE=62M export KMP_SETTINGS=1 def_flags="--bind-to core -x OMP_PROC_BIND=true --report-bindings" def_flags="$def_flags -x KMP_AFFINITY=$KMP_AFFINITY" # in your case 7: ONP=7 flags="$def_flags -x MKL_NUM_THREADS=$ONP -x MKL_DYNAMIC=FALSE" flags="$flags -x OMP_NUM_THREADS=$ONP -x OMP_DYNAMIC=FALSE" flags="$flags -x KMP_STACKSIZE=$KMP_STACKSIZE" flags="$flags --map-by ppr:1:socket:pe=7" then run your program: mpirun $flags <app> A lot of the option flags are duplicated (and strictly not needed), but I provide them for easy testing changes. Surely this is application dependent, but for my case it was performing really well. 2016-01-06 20:48 GMT+01:00 Erik Schnetter <schnet...@gmail.com>: > Setting KMP_AFFINITY will probably override anything that OpenMPI > sets. Can you try without? > > -erik > > On Wed, Jan 6, 2016 at 2:46 PM, Matt Thompson <fort...@gmail.com> wrote: > > Hello Open MPI Gurus, > > > > As I explore MPI-OpenMP hybrid codes, I'm trying to figure out how to do > > things to get the same behavior in various stacks. For example, I have a > > 28-core node (2 14-core Haswells), and I'd like to run 4 MPI processes > and 7 > > OpenMP threads. Thus, I'd like the processes to be 2 processes per socket > > with the OpenMP threads laid out on them. Using a "hybrid Hello World" > > program, I can achieve this with Intel MPI (after a lot of testing): > > > > (1097) $ env OMP_NUM_THREADS=7 KMP_AFFINITY=compact mpirun -np 4 > > ./hello-hybrid.x | sort -g -k 18 > > srun.slurm: cluster configuration lacks support for cpu binding > > Hello from thread 0 out of 7 from process 2 out of 4 on borgo035 on CPU 0 > > Hello from thread 1 out of 7 from process 2 out of 4 on borgo035 on CPU 1 > > Hello from thread 2 out of 7 from process 2 out of 4 on borgo035 on CPU 2 > > Hello from thread 3 out of 7 from process 2 out of 4 on borgo035 on CPU 3 > > Hello from thread 4 out of 7 from process 2 out of 4 on borgo035 on CPU 4 > > Hello from thread 5 out of 7 from process 2 out of 4 on borgo035 on CPU 5 > > Hello from thread 6 out of 7 from process 2 out of 4 on borgo035 on CPU 6 > > Hello from thread 0 out of 7 from process 3 out of 4 on borgo035 on CPU 7 > > Hello from thread 1 out of 7 from process 3 out of 4 on borgo035 on CPU 8 > > Hello from thread 2 out of 7 from process 3 out of 4 on borgo035 on CPU 9 > > Hello from thread 3 out of 7 from process 3 out of 4 on borgo035 on CPU > 10 > > Hello from thread 4 out of 7 from process 3 out of 4 on borgo035 on CPU > 11 > > Hello from thread 5 out of 7 from process 3 out of 4 on borgo035 on CPU > 12 > > Hello from thread 6 out of 7 from process 3 out of 4 on borgo035 on CPU > 13 > > Hello from thread 0 out of 7 from process 0 out of 4 on borgo035 on CPU > 14 > > Hello from thread 1 out of 7 from process 0 out of 4 on borgo035 on CPU > 15 > > Hello from thread 2 out of 7 from process 0 out of 4 on borgo035 on CPU > 16 > > Hello from thread 3 out of 7 from process 0 out of 4 on borgo035 on CPU > 17 > > Hello from thread 4 out of 7 from process 0 out of 4 on borgo035 on CPU > 18 > > Hello from thread 5 out of 7 from process 0 out of 4 on borgo035 on CPU > 19 > > Hello from thread 6 out of 7 from process 0 out of 4 on borgo035 on CPU > 20 > > Hello from thread 0 out of 7 from process 1 out of 4 on borgo035 on CPU > 21 > > Hello from thread 1 out of 7 from process 1 out of 4 on borgo035 on CPU > 22 > > Hello from thread 2 out of 7 from process 1 out of 4 on borgo035 on CPU > 23 > > Hello from thread 3 out of 7 from process 1 out of 4 on borgo035 on CPU > 24 > > Hello from thread 4 out of 7 from process 1 out of 4 on borgo035 on CPU > 25 > > Hello from thread 5 out of 7 from process 1 out of 4 on borgo035 on CPU > 26 > > Hello from thread 6 out of 7 from process 1 out of 4 on borgo035 on CPU > 27 > > > > Other than the odd fact that Process #0 seemed to start on Socket #1 > (this > > might be an artifact of how I'm trying to detect the CPU I'm on), this > looks > > reasonable. 14 threads on each socket and each process is laying out its > > threads in a nice orderly fashion. > > > > I'm trying to figure out how to do this with Open MPI (version 1.10.0) > and > > apparently I am just not quite good enough to figure it out. The closest > > I've gotten is: > > > > (1155) $ env OMP_NUM_THREADS=7 KMP_AFFINITY=compact mpirun -np 4 -map-by > > ppr:2:socket ./hello-hybrid.x | sort -g -k 18 > > Hello from thread 0 out of 7 from process 0 out of 4 on borgo035 on CPU 0 > > Hello from thread 0 out of 7 from process 1 out of 4 on borgo035 on CPU 0 > > Hello from thread 1 out of 7 from process 0 out of 4 on borgo035 on CPU 1 > > Hello from thread 1 out of 7 from process 1 out of 4 on borgo035 on CPU 1 > > Hello from thread 2 out of 7 from process 0 out of 4 on borgo035 on CPU 2 > > Hello from thread 2 out of 7 from process 1 out of 4 on borgo035 on CPU 2 > > Hello from thread 3 out of 7 from process 0 out of 4 on borgo035 on CPU 3 > > Hello from thread 3 out of 7 from process 1 out of 4 on borgo035 on CPU 3 > > Hello from thread 4 out of 7 from process 0 out of 4 on borgo035 on CPU 4 > > Hello from thread 4 out of 7 from process 1 out of 4 on borgo035 on CPU 4 > > Hello from thread 5 out of 7 from process 0 out of 4 on borgo035 on CPU 5 > > Hello from thread 5 out of 7 from process 1 out of 4 on borgo035 on CPU 5 > > Hello from thread 6 out of 7 from process 0 out of 4 on borgo035 on CPU 6 > > Hello from thread 6 out of 7 from process 1 out of 4 on borgo035 on CPU 6 > > Hello from thread 0 out of 7 from process 2 out of 4 on borgo035 on CPU > 14 > > Hello from thread 0 out of 7 from process 3 out of 4 on borgo035 on CPU > 14 > > Hello from thread 1 out of 7 from process 2 out of 4 on borgo035 on CPU > 15 > > Hello from thread 1 out of 7 from process 3 out of 4 on borgo035 on CPU > 15 > > Hello from thread 2 out of 7 from process 2 out of 4 on borgo035 on CPU > 16 > > Hello from thread 2 out of 7 from process 3 out of 4 on borgo035 on CPU > 16 > > Hello from thread 3 out of 7 from process 2 out of 4 on borgo035 on CPU > 17 > > Hello from thread 3 out of 7 from process 3 out of 4 on borgo035 on CPU > 17 > > Hello from thread 4 out of 7 from process 2 out of 4 on borgo035 on CPU > 18 > > Hello from thread 4 out of 7 from process 3 out of 4 on borgo035 on CPU > 18 > > Hello from thread 5 out of 7 from process 2 out of 4 on borgo035 on CPU > 19 > > Hello from thread 5 out of 7 from process 3 out of 4 on borgo035 on CPU > 19 > > Hello from thread 6 out of 7 from process 2 out of 4 on borgo035 on CPU > 20 > > Hello from thread 6 out of 7 from process 3 out of 4 on borgo035 on CPU > 20 > > > > Obviously not right. Any ideas on how to help me learn? The man mpirun > page > > is a bit formidable in the pinning part, so maybe I've missed an obvious > > answer. > > > > Matt > > -- > > Matt Thompson > > > > Man Among Men > > Fulcrum of History > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2016/01/28217.php > > > > -- > Erik Schnetter <schnet...@gmail.com> > http://www.perimeterinstitute.ca/personal/eschnetter/ > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28218.php > -- Kind regards Nick