A ha! The Gurus know all. The map-by was the magic sauce: (1176) $ env OMP_NUM_THREADS=7 KMP_AFFINITY=compact mpirun -np 4 -map-by ppr:2:socket:pe=7 ./hello-hybrid.x | sort -g -k 18 Hello from thread 0 out of 7 from process 0 out of 4 on borgo035 on CPU 0 Hello from thread 1 out of 7 from process 0 out of 4 on borgo035 on CPU 1 Hello from thread 2 out of 7 from process 0 out of 4 on borgo035 on CPU 2 Hello from thread 3 out of 7 from process 0 out of 4 on borgo035 on CPU 3 Hello from thread 4 out of 7 from process 0 out of 4 on borgo035 on CPU 4 Hello from thread 5 out of 7 from process 0 out of 4 on borgo035 on CPU 5 Hello from thread 6 out of 7 from process 0 out of 4 on borgo035 on CPU 6 Hello from thread 0 out of 7 from process 1 out of 4 on borgo035 on CPU 7 Hello from thread 1 out of 7 from process 1 out of 4 on borgo035 on CPU 8 Hello from thread 2 out of 7 from process 1 out of 4 on borgo035 on CPU 9 Hello from thread 3 out of 7 from process 1 out of 4 on borgo035 on CPU 10 Hello from thread 4 out of 7 from process 1 out of 4 on borgo035 on CPU 11 Hello from thread 5 out of 7 from process 1 out of 4 on borgo035 on CPU 12 Hello from thread 6 out of 7 from process 1 out of 4 on borgo035 on CPU 13 Hello from thread 0 out of 7 from process 2 out of 4 on borgo035 on CPU 14 Hello from thread 1 out of 7 from process 2 out of 4 on borgo035 on CPU 15 Hello from thread 2 out of 7 from process 2 out of 4 on borgo035 on CPU 16 Hello from thread 3 out of 7 from process 2 out of 4 on borgo035 on CPU 17 Hello from thread 4 out of 7 from process 2 out of 4 on borgo035 on CPU 18 Hello from thread 5 out of 7 from process 2 out of 4 on borgo035 on CPU 19 Hello from thread 6 out of 7 from process 2 out of 4 on borgo035 on CPU 20 Hello from thread 0 out of 7 from process 3 out of 4 on borgo035 on CPU 21 Hello from thread 1 out of 7 from process 3 out of 4 on borgo035 on CPU 22 Hello from thread 2 out of 7 from process 3 out of 4 on borgo035 on CPU 23 Hello from thread 3 out of 7 from process 3 out of 4 on borgo035 on CPU 24 Hello from thread 4 out of 7 from process 3 out of 4 on borgo035 on CPU 25 Hello from thread 5 out of 7 from process 3 out of 4 on borgo035 on CPU 26 Hello from thread 6 out of 7 from process 3 out of 4 on borgo035 on CPU 27
So, a question: what does "ppr" mean? The man page seems to accept it as an axiom of Open MPI: --map-by <foo> Map to the specified object, defaults to socket. Supported options include slot, hwthread, core, L1cache, L2cache, L3cache, socket, numa, board, node, sequential, distance, and ppr. Any object can include modifiers by adding a : and any combination of PE=n (bind n processing elements to each proc), SPAN (load balance the processes across the allocation), OVERSUBSCRIBE (allow more processes on a node than pro‐ cessing elements), and NOOVERSUBSCRIBE. This includes PPR, where the pattern would be terminated by another colon to separate it from the modifiers. Is it an acronym/initialism? From some experimenting it seems to be ppr:2:socket means 2 processes per socket? And pe=7 means leave 7 processes between them? Is that about right? Matt On Wed, Jan 6, 2016 at 3:19 PM, Ralph Castain <r...@open-mpi.org> wrote: > I believe he wants two procs/socket, so you’d need ppr:2:socket:pe=7 > > > On Jan 6, 2016, at 12:14 PM, Nick Papior <nickpap...@gmail.com> wrote: > > I do not think KMP_AFFINITY should affect anything in OpenMPI, it is an > MKL env setting? Or am I wrong? > > Note that these are used in an environment where openmpi automatically > gets the host-file. Hence they are not present. > With intel mkl and openmpi I got the best performance using these, rather > long flags: > > export KMP_AFFINITY=verbose,compact,granularity=core > export KMP_STACKSIZE=62M > export KMP_SETTINGS=1 > > def_flags="--bind-to core -x OMP_PROC_BIND=true --report-bindings" > def_flags="$def_flags -x KMP_AFFINITY=$KMP_AFFINITY" > > # in your case 7: > ONP=7 > flags="$def_flags -x MKL_NUM_THREADS=$ONP -x MKL_DYNAMIC=FALSE" > flags="$flags -x OMP_NUM_THREADS=$ONP -x OMP_DYNAMIC=FALSE" > flags="$flags -x KMP_STACKSIZE=$KMP_STACKSIZE" > flags="$flags --map-by ppr:1:socket:pe=7" > > then run your program: > > mpirun $flags <app> > > A lot of the option flags are duplicated (and strictly not needed), but I > provide them for easy testing changes. > Surely this is application dependent, but for my case it was performing > really well. > > > 2016-01-06 20:48 GMT+01:00 Erik Schnetter <schnet...@gmail.com>: > >> Setting KMP_AFFINITY will probably override anything that OpenMPI >> sets. Can you try without? >> >> -erik >> >> On Wed, Jan 6, 2016 at 2:46 PM, Matt Thompson <fort...@gmail.com> wrote: >> > Hello Open MPI Gurus, >> > >> > As I explore MPI-OpenMP hybrid codes, I'm trying to figure out how to do >> > things to get the same behavior in various stacks. For example, I have a >> > 28-core node (2 14-core Haswells), and I'd like to run 4 MPI processes >> and 7 >> > OpenMP threads. Thus, I'd like the processes to be 2 processes per >> socket >> > with the OpenMP threads laid out on them. Using a "hybrid Hello World" >> > program, I can achieve this with Intel MPI (after a lot of testing): >> > >> > (1097) $ env OMP_NUM_THREADS=7 KMP_AFFINITY=compact mpirun -np 4 >> > ./hello-hybrid.x | sort -g -k 18 >> > srun.slurm: cluster configuration lacks support for cpu binding >> > Hello from thread 0 out of 7 from process 2 out of 4 on borgo035 on CPU >> 0 >> > Hello from thread 1 out of 7 from process 2 out of 4 on borgo035 on CPU >> 1 >> > Hello from thread 2 out of 7 from process 2 out of 4 on borgo035 on CPU >> 2 >> > Hello from thread 3 out of 7 from process 2 out of 4 on borgo035 on CPU >> 3 >> > Hello from thread 4 out of 7 from process 2 out of 4 on borgo035 on CPU >> 4 >> > Hello from thread 5 out of 7 from process 2 out of 4 on borgo035 on CPU >> 5 >> > Hello from thread 6 out of 7 from process 2 out of 4 on borgo035 on CPU >> 6 >> > Hello from thread 0 out of 7 from process 3 out of 4 on borgo035 on CPU >> 7 >> > Hello from thread 1 out of 7 from process 3 out of 4 on borgo035 on CPU >> 8 >> > Hello from thread 2 out of 7 from process 3 out of 4 on borgo035 on CPU >> 9 >> > Hello from thread 3 out of 7 from process 3 out of 4 on borgo035 on CPU >> 10 >> > Hello from thread 4 out of 7 from process 3 out of 4 on borgo035 on CPU >> 11 >> > Hello from thread 5 out of 7 from process 3 out of 4 on borgo035 on CPU >> 12 >> > Hello from thread 6 out of 7 from process 3 out of 4 on borgo035 on CPU >> 13 >> > Hello from thread 0 out of 7 from process 0 out of 4 on borgo035 on CPU >> 14 >> > Hello from thread 1 out of 7 from process 0 out of 4 on borgo035 on CPU >> 15 >> > Hello from thread 2 out of 7 from process 0 out of 4 on borgo035 on CPU >> 16 >> > Hello from thread 3 out of 7 from process 0 out of 4 on borgo035 on CPU >> 17 >> > Hello from thread 4 out of 7 from process 0 out of 4 on borgo035 on CPU >> 18 >> > Hello from thread 5 out of 7 from process 0 out of 4 on borgo035 on CPU >> 19 >> > Hello from thread 6 out of 7 from process 0 out of 4 on borgo035 on CPU >> 20 >> > Hello from thread 0 out of 7 from process 1 out of 4 on borgo035 on CPU >> 21 >> > Hello from thread 1 out of 7 from process 1 out of 4 on borgo035 on CPU >> 22 >> > Hello from thread 2 out of 7 from process 1 out of 4 on borgo035 on CPU >> 23 >> > Hello from thread 3 out of 7 from process 1 out of 4 on borgo035 on CPU >> 24 >> > Hello from thread 4 out of 7 from process 1 out of 4 on borgo035 on CPU >> 25 >> > Hello from thread 5 out of 7 from process 1 out of 4 on borgo035 on CPU >> 26 >> > Hello from thread 6 out of 7 from process 1 out of 4 on borgo035 on CPU >> 27 >> > >> > Other than the odd fact that Process #0 seemed to start on Socket #1 >> (this >> > might be an artifact of how I'm trying to detect the CPU I'm on), this >> looks >> > reasonable. 14 threads on each socket and each process is laying out its >> > threads in a nice orderly fashion. >> > >> > I'm trying to figure out how to do this with Open MPI (version 1.10.0) >> and >> > apparently I am just not quite good enough to figure it out. The closest >> > I've gotten is: >> > >> > (1155) $ env OMP_NUM_THREADS=7 KMP_AFFINITY=compact mpirun -np 4 -map-by >> > ppr:2:socket ./hello-hybrid.x | sort -g -k 18 >> > Hello from thread 0 out of 7 from process 0 out of 4 on borgo035 on CPU >> 0 >> > Hello from thread 0 out of 7 from process 1 out of 4 on borgo035 on CPU >> 0 >> > Hello from thread 1 out of 7 from process 0 out of 4 on borgo035 on CPU >> 1 >> > Hello from thread 1 out of 7 from process 1 out of 4 on borgo035 on CPU >> 1 >> > Hello from thread 2 out of 7 from process 0 out of 4 on borgo035 on CPU >> 2 >> > Hello from thread 2 out of 7 from process 1 out of 4 on borgo035 on CPU >> 2 >> > Hello from thread 3 out of 7 from process 0 out of 4 on borgo035 on CPU >> 3 >> > Hello from thread 3 out of 7 from process 1 out of 4 on borgo035 on CPU >> 3 >> > Hello from thread 4 out of 7 from process 0 out of 4 on borgo035 on CPU >> 4 >> > Hello from thread 4 out of 7 from process 1 out of 4 on borgo035 on CPU >> 4 >> > Hello from thread 5 out of 7 from process 0 out of 4 on borgo035 on CPU >> 5 >> > Hello from thread 5 out of 7 from process 1 out of 4 on borgo035 on CPU >> 5 >> > Hello from thread 6 out of 7 from process 0 out of 4 on borgo035 on CPU >> 6 >> > Hello from thread 6 out of 7 from process 1 out of 4 on borgo035 on CPU >> 6 >> > Hello from thread 0 out of 7 from process 2 out of 4 on borgo035 on CPU >> 14 >> > Hello from thread 0 out of 7 from process 3 out of 4 on borgo035 on CPU >> 14 >> > Hello from thread 1 out of 7 from process 2 out of 4 on borgo035 on CPU >> 15 >> > Hello from thread 1 out of 7 from process 3 out of 4 on borgo035 on CPU >> 15 >> > Hello from thread 2 out of 7 from process 2 out of 4 on borgo035 on CPU >> 16 >> > Hello from thread 2 out of 7 from process 3 out of 4 on borgo035 on CPU >> 16 >> > Hello from thread 3 out of 7 from process 2 out of 4 on borgo035 on CPU >> 17 >> > Hello from thread 3 out of 7 from process 3 out of 4 on borgo035 on CPU >> 17 >> > Hello from thread 4 out of 7 from process 2 out of 4 on borgo035 on CPU >> 18 >> > Hello from thread 4 out of 7 from process 3 out of 4 on borgo035 on CPU >> 18 >> > Hello from thread 5 out of 7 from process 2 out of 4 on borgo035 on CPU >> 19 >> > Hello from thread 5 out of 7 from process 3 out of 4 on borgo035 on CPU >> 19 >> > Hello from thread 6 out of 7 from process 2 out of 4 on borgo035 on CPU >> 20 >> > Hello from thread 6 out of 7 from process 3 out of 4 on borgo035 on CPU >> 20 >> > >> > Obviously not right. Any ideas on how to help me learn? The man mpirun >> page >> > is a bit formidable in the pinning part, so maybe I've missed an obvious >> > answer. >> > >> > Matt >> > -- >> > Matt Thompson >> > >> > Man Among Men >> > Fulcrum of History >> > >> > >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > Link to this post: >> > http://www.open-mpi.org/community/lists/users/2016/01/28217.php >> >> >> >> -- >> Erik Schnetter <schnet...@gmail.com> >> http://www.perimeterinstitute.ca/personal/eschnetter/ >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/01/28218.php >> > > > > -- > Kind regards Nick > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28219.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28221.php > -- Matt Thompson Man Among Men Fulcrum of History