Thanks for the clarification. :) 2016-01-07 0:48 GMT+01:00 Jeff Hammond <jeff.scie...@gmail.com>:
> KMP_AFFINITY is an Intel OpenMP runtime setting, not an MKL option, > although MKL will respect it since MKL uses the Intel OpenMP runtime (by > default, at least). > > The OpenMP 4.0 equivalent of KMP_AFFINITY are OMP_PROC_BIND and > OMP_PLACES. I do not know how many OpenMP implementations support these > two options, but Intel and GCC should. > > Best, > > Jeff > > On Wed, Jan 6, 2016 at 1:04 PM, Nick Papior <nickpap...@gmail.com> wrote: > >> Ok, thanks :) >> >> 2016-01-06 22:03 GMT+01:00 Ralph Castain <r...@open-mpi.org>: >> >>> Not really - just consistent with the other cmd line options. >>> >>> On Jan 6, 2016, at 12:58 PM, Nick Papior <nickpap...@gmail.com> wrote: >>> >>> It was just that when I started using map-by I didn't get why: >>> ppr:2 >>> but >>> PE=2 >>> I would at least have expected: >>> ppr=2:PE=2 >>> or >>> ppr:2:PE:2 >>> ? >>> Does this have a reason? >>> >>> 2016-01-06 21:54 GMT+01:00 Ralph Castain <r...@open-mpi.org>: >>> >>>> <LOL> ah yes, “r” = “resource”!! Thanks for the reminder :-) >>>> >>>> The difference in delimiter is just to simplify parsing - we can >>>> “split” the string on colons to separate out the options, and then use “=“ >>>> to set the value. Nothing particularly significant about the choice. >>>> >>>> >>>> On Jan 6, 2016, at 12:48 PM, Nick Papior <nickpap...@gmail.com> wrote: >>>> >>>> Your are correct. "socket" means that the resource is socket, "ppr:2" >>>> means 2 processes per resource. >>>> PE=<n> is Processing Elements per process. >>>> >>>> Perhaps the dev's can shed some light on why PE uses "=" and ppr has >>>> ":" as delimiter for resource request? >>>> >>>> This "old" slide show from Jeff shows the usage (although the input >>>> have changed since 1.7): >>>> >>>> http://www.slideshare.net/jsquyres/open-mpi-explorations-in-process-affinity-eurompi13-presentation >>>> >>>> >>>> 2016-01-06 21:33 GMT+01:00 Matt Thompson <fort...@gmail.com>: >>>> >>>>> A ha! The Gurus know all. The map-by was the magic sauce: >>>>> >>>>> (1176) $ env OMP_NUM_THREADS=7 KMP_AFFINITY=compact mpirun -np 4 >>>>> -map-by ppr:2:socket:pe=7 ./hello-hybrid.x | sort -g -k 18 >>>>> Hello from thread 0 out of 7 from process 0 out of 4 on borgo035 on >>>>> CPU 0 >>>>> Hello from thread 1 out of 7 from process 0 out of 4 on borgo035 on >>>>> CPU 1 >>>>> Hello from thread 2 out of 7 from process 0 out of 4 on borgo035 on >>>>> CPU 2 >>>>> Hello from thread 3 out of 7 from process 0 out of 4 on borgo035 on >>>>> CPU 3 >>>>> Hello from thread 4 out of 7 from process 0 out of 4 on borgo035 on >>>>> CPU 4 >>>>> Hello from thread 5 out of 7 from process 0 out of 4 on borgo035 on >>>>> CPU 5 >>>>> Hello from thread 6 out of 7 from process 0 out of 4 on borgo035 on >>>>> CPU 6 >>>>> Hello from thread 0 out of 7 from process 1 out of 4 on borgo035 on >>>>> CPU 7 >>>>> Hello from thread 1 out of 7 from process 1 out of 4 on borgo035 on >>>>> CPU 8 >>>>> Hello from thread 2 out of 7 from process 1 out of 4 on borgo035 on >>>>> CPU 9 >>>>> Hello from thread 3 out of 7 from process 1 out of 4 on borgo035 on >>>>> CPU 10 >>>>> Hello from thread 4 out of 7 from process 1 out of 4 on borgo035 on >>>>> CPU 11 >>>>> Hello from thread 5 out of 7 from process 1 out of 4 on borgo035 on >>>>> CPU 12 >>>>> Hello from thread 6 out of 7 from process 1 out of 4 on borgo035 on >>>>> CPU 13 >>>>> Hello from thread 0 out of 7 from process 2 out of 4 on borgo035 on >>>>> CPU 14 >>>>> Hello from thread 1 out of 7 from process 2 out of 4 on borgo035 on >>>>> CPU 15 >>>>> Hello from thread 2 out of 7 from process 2 out of 4 on borgo035 on >>>>> CPU 16 >>>>> Hello from thread 3 out of 7 from process 2 out of 4 on borgo035 on >>>>> CPU 17 >>>>> Hello from thread 4 out of 7 from process 2 out of 4 on borgo035 on >>>>> CPU 18 >>>>> Hello from thread 5 out of 7 from process 2 out of 4 on borgo035 on >>>>> CPU 19 >>>>> Hello from thread 6 out of 7 from process 2 out of 4 on borgo035 on >>>>> CPU 20 >>>>> Hello from thread 0 out of 7 from process 3 out of 4 on borgo035 on >>>>> CPU 21 >>>>> Hello from thread 1 out of 7 from process 3 out of 4 on borgo035 on >>>>> CPU 22 >>>>> Hello from thread 2 out of 7 from process 3 out of 4 on borgo035 on >>>>> CPU 23 >>>>> Hello from thread 3 out of 7 from process 3 out of 4 on borgo035 on >>>>> CPU 24 >>>>> Hello from thread 4 out of 7 from process 3 out of 4 on borgo035 on >>>>> CPU 25 >>>>> Hello from thread 5 out of 7 from process 3 out of 4 on borgo035 on >>>>> CPU 26 >>>>> Hello from thread 6 out of 7 from process 3 out of 4 on borgo035 on >>>>> CPU 27 >>>>> >>>>> So, a question: what does "ppr" mean? The man page seems to accept it >>>>> as an axiom of Open MPI: >>>>> >>>>> --map-by <foo> >>>>> Map to the specified object, defaults to socket. >>>>> Supported options include slot, hwthread, core, L1cache, L2cache, L3cache, >>>>> socket, numa, >>>>> board, node, sequential, distance, and ppr. Any object >>>>> can include modifiers by adding a : and any combination of PE=n (bind n >>>>> processing >>>>> elements to each proc), SPAN (load balance the >>>>> processes across the allocation), OVERSUBSCRIBE (allow more processes on a >>>>> node than pro‐ >>>>> cessing elements), and NOOVERSUBSCRIBE. This includes >>>>> PPR, where the pattern would be terminated by another colon to separate it >>>>> from the >>>>> modifiers. >>>>> >>>>> Is it an acronym/initialism? From some experimenting it seems to be >>>>> ppr:2:socket means 2 processes per socket? And pe=7 means leave 7 >>>>> processes >>>>> between them? Is that about right? >>>>> >>>>> Matt >>>>> >>>>> On Wed, Jan 6, 2016 at 3:19 PM, Ralph Castain <r...@open-mpi.org> >>>>> wrote: >>>>> >>>>>> I believe he wants two procs/socket, so you’d need ppr:2:socket:pe=7 >>>>>> >>>>>> >>>>>> On Jan 6, 2016, at 12:14 PM, Nick Papior <nickpap...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> I do not think KMP_AFFINITY should affect anything in OpenMPI, it is >>>>>> an MKL env setting? Or am I wrong? >>>>>> >>>>>> Note that these are used in an environment where openmpi >>>>>> automatically gets the host-file. Hence they are not present. >>>>>> With intel mkl and openmpi I got the best performance using these, >>>>>> rather long flags: >>>>>> >>>>>> export KMP_AFFINITY=verbose,compact,granularity=core >>>>>> export KMP_STACKSIZE=62M >>>>>> export KMP_SETTINGS=1 >>>>>> >>>>>> def_flags="--bind-to core -x OMP_PROC_BIND=true --report-bindings" >>>>>> def_flags="$def_flags -x KMP_AFFINITY=$KMP_AFFINITY" >>>>>> >>>>>> # in your case 7: >>>>>> ONP=7 >>>>>> flags="$def_flags -x MKL_NUM_THREADS=$ONP -x MKL_DYNAMIC=FALSE" >>>>>> flags="$flags -x OMP_NUM_THREADS=$ONP -x OMP_DYNAMIC=FALSE" >>>>>> flags="$flags -x KMP_STACKSIZE=$KMP_STACKSIZE" >>>>>> flags="$flags --map-by ppr:1:socket:pe=7" >>>>>> >>>>>> then run your program: >>>>>> >>>>>> mpirun $flags <app> >>>>>> >>>>>> A lot of the option flags are duplicated (and strictly not needed), >>>>>> but I provide them for easy testing changes. >>>>>> Surely this is application dependent, but for my case it was >>>>>> performing really well. >>>>>> >>>>>> >>>>>> 2016-01-06 20:48 GMT+01:00 Erik Schnetter <schnet...@gmail.com>: >>>>>> >>>>>>> Setting KMP_AFFINITY will probably override anything that OpenMPI >>>>>>> sets. Can you try without? >>>>>>> >>>>>>> -erik >>>>>>> >>>>>>> On Wed, Jan 6, 2016 at 2:46 PM, Matt Thompson <fort...@gmail.com> >>>>>>> wrote: >>>>>>> > Hello Open MPI Gurus, >>>>>>> > >>>>>>> > As I explore MPI-OpenMP hybrid codes, I'm trying to figure out how >>>>>>> to do >>>>>>> > things to get the same behavior in various stacks. For example, I >>>>>>> have a >>>>>>> > 28-core node (2 14-core Haswells), and I'd like to run 4 MPI >>>>>>> processes and 7 >>>>>>> > OpenMP threads. Thus, I'd like the processes to be 2 processes per >>>>>>> socket >>>>>>> > with the OpenMP threads laid out on them. Using a "hybrid Hello >>>>>>> World" >>>>>>> > program, I can achieve this with Intel MPI (after a lot of >>>>>>> testing): >>>>>>> > >>>>>>> > (1097) $ env OMP_NUM_THREADS=7 KMP_AFFINITY=compact mpirun -np 4 >>>>>>> > ./hello-hybrid.x | sort -g -k 18 >>>>>>> > srun.slurm: cluster configuration lacks support for cpu binding >>>>>>> > Hello from thread 0 out of 7 from process 2 out of 4 on borgo035 >>>>>>> on CPU 0 >>>>>>> > Hello from thread 1 out of 7 from process 2 out of 4 on borgo035 >>>>>>> on CPU 1 >>>>>>> > Hello from thread 2 out of 7 from process 2 out of 4 on borgo035 >>>>>>> on CPU 2 >>>>>>> > Hello from thread 3 out of 7 from process 2 out of 4 on borgo035 >>>>>>> on CPU 3 >>>>>>> > Hello from thread 4 out of 7 from process 2 out of 4 on borgo035 >>>>>>> on CPU 4 >>>>>>> > Hello from thread 5 out of 7 from process 2 out of 4 on borgo035 >>>>>>> on CPU 5 >>>>>>> > Hello from thread 6 out of 7 from process 2 out of 4 on borgo035 >>>>>>> on CPU 6 >>>>>>> > Hello from thread 0 out of 7 from process 3 out of 4 on borgo035 >>>>>>> on CPU 7 >>>>>>> > Hello from thread 1 out of 7 from process 3 out of 4 on borgo035 >>>>>>> on CPU 8 >>>>>>> > Hello from thread 2 out of 7 from process 3 out of 4 on borgo035 >>>>>>> on CPU 9 >>>>>>> > Hello from thread 3 out of 7 from process 3 out of 4 on borgo035 >>>>>>> on CPU 10 >>>>>>> > Hello from thread 4 out of 7 from process 3 out of 4 on borgo035 >>>>>>> on CPU 11 >>>>>>> > Hello from thread 5 out of 7 from process 3 out of 4 on borgo035 >>>>>>> on CPU 12 >>>>>>> > Hello from thread 6 out of 7 from process 3 out of 4 on borgo035 >>>>>>> on CPU 13 >>>>>>> > Hello from thread 0 out of 7 from process 0 out of 4 on borgo035 >>>>>>> on CPU 14 >>>>>>> > Hello from thread 1 out of 7 from process 0 out of 4 on borgo035 >>>>>>> on CPU 15 >>>>>>> > Hello from thread 2 out of 7 from process 0 out of 4 on borgo035 >>>>>>> on CPU 16 >>>>>>> > Hello from thread 3 out of 7 from process 0 out of 4 on borgo035 >>>>>>> on CPU 17 >>>>>>> > Hello from thread 4 out of 7 from process 0 out of 4 on borgo035 >>>>>>> on CPU 18 >>>>>>> > Hello from thread 5 out of 7 from process 0 out of 4 on borgo035 >>>>>>> on CPU 19 >>>>>>> > Hello from thread 6 out of 7 from process 0 out of 4 on borgo035 >>>>>>> on CPU 20 >>>>>>> > Hello from thread 0 out of 7 from process 1 out of 4 on borgo035 >>>>>>> on CPU 21 >>>>>>> > Hello from thread 1 out of 7 from process 1 out of 4 on borgo035 >>>>>>> on CPU 22 >>>>>>> > Hello from thread 2 out of 7 from process 1 out of 4 on borgo035 >>>>>>> on CPU 23 >>>>>>> > Hello from thread 3 out of 7 from process 1 out of 4 on borgo035 >>>>>>> on CPU 24 >>>>>>> > Hello from thread 4 out of 7 from process 1 out of 4 on borgo035 >>>>>>> on CPU 25 >>>>>>> > Hello from thread 5 out of 7 from process 1 out of 4 on borgo035 >>>>>>> on CPU 26 >>>>>>> > Hello from thread 6 out of 7 from process 1 out of 4 on borgo035 >>>>>>> on CPU 27 >>>>>>> > >>>>>>> > Other than the odd fact that Process #0 seemed to start on Socket >>>>>>> #1 (this >>>>>>> > might be an artifact of how I'm trying to detect the CPU I'm on), >>>>>>> this looks >>>>>>> > reasonable. 14 threads on each socket and each process is laying >>>>>>> out its >>>>>>> > threads in a nice orderly fashion. >>>>>>> > >>>>>>> > I'm trying to figure out how to do this with Open MPI (version >>>>>>> 1.10.0) and >>>>>>> > apparently I am just not quite good enough to figure it out. The >>>>>>> closest >>>>>>> > I've gotten is: >>>>>>> > >>>>>>> > (1155) $ env OMP_NUM_THREADS=7 KMP_AFFINITY=compact mpirun -np 4 >>>>>>> -map-by >>>>>>> > ppr:2:socket ./hello-hybrid.x | sort -g -k 18 >>>>>>> > Hello from thread 0 out of 7 from process 0 out of 4 on borgo035 >>>>>>> on CPU 0 >>>>>>> > Hello from thread 0 out of 7 from process 1 out of 4 on borgo035 >>>>>>> on CPU 0 >>>>>>> > Hello from thread 1 out of 7 from process 0 out of 4 on borgo035 >>>>>>> on CPU 1 >>>>>>> > Hello from thread 1 out of 7 from process 1 out of 4 on borgo035 >>>>>>> on CPU 1 >>>>>>> > Hello from thread 2 out of 7 from process 0 out of 4 on borgo035 >>>>>>> on CPU 2 >>>>>>> > Hello from thread 2 out of 7 from process 1 out of 4 on borgo035 >>>>>>> on CPU 2 >>>>>>> > Hello from thread 3 out of 7 from process 0 out of 4 on borgo035 >>>>>>> on CPU 3 >>>>>>> > Hello from thread 3 out of 7 from process 1 out of 4 on borgo035 >>>>>>> on CPU 3 >>>>>>> > Hello from thread 4 out of 7 from process 0 out of 4 on borgo035 >>>>>>> on CPU 4 >>>>>>> > Hello from thread 4 out of 7 from process 1 out of 4 on borgo035 >>>>>>> on CPU 4 >>>>>>> > Hello from thread 5 out of 7 from process 0 out of 4 on borgo035 >>>>>>> on CPU 5 >>>>>>> > Hello from thread 5 out of 7 from process 1 out of 4 on borgo035 >>>>>>> on CPU 5 >>>>>>> > Hello from thread 6 out of 7 from process 0 out of 4 on borgo035 >>>>>>> on CPU 6 >>>>>>> > Hello from thread 6 out of 7 from process 1 out of 4 on borgo035 >>>>>>> on CPU 6 >>>>>>> > Hello from thread 0 out of 7 from process 2 out of 4 on borgo035 >>>>>>> on CPU 14 >>>>>>> > Hello from thread 0 out of 7 from process 3 out of 4 on borgo035 >>>>>>> on CPU 14 >>>>>>> > Hello from thread 1 out of 7 from process 2 out of 4 on borgo035 >>>>>>> on CPU 15 >>>>>>> > Hello from thread 1 out of 7 from process 3 out of 4 on borgo035 >>>>>>> on CPU 15 >>>>>>> > Hello from thread 2 out of 7 from process 2 out of 4 on borgo035 >>>>>>> on CPU 16 >>>>>>> > Hello from thread 2 out of 7 from process 3 out of 4 on borgo035 >>>>>>> on CPU 16 >>>>>>> > Hello from thread 3 out of 7 from process 2 out of 4 on borgo035 >>>>>>> on CPU 17 >>>>>>> > Hello from thread 3 out of 7 from process 3 out of 4 on borgo035 >>>>>>> on CPU 17 >>>>>>> > Hello from thread 4 out of 7 from process 2 out of 4 on borgo035 >>>>>>> on CPU 18 >>>>>>> > Hello from thread 4 out of 7 from process 3 out of 4 on borgo035 >>>>>>> on CPU 18 >>>>>>> > Hello from thread 5 out of 7 from process 2 out of 4 on borgo035 >>>>>>> on CPU 19 >>>>>>> > Hello from thread 5 out of 7 from process 3 out of 4 on borgo035 >>>>>>> on CPU 19 >>>>>>> > Hello from thread 6 out of 7 from process 2 out of 4 on borgo035 >>>>>>> on CPU 20 >>>>>>> > Hello from thread 6 out of 7 from process 3 out of 4 on borgo035 >>>>>>> on CPU 20 >>>>>>> > >>>>>>> > Obviously not right. Any ideas on how to help me learn? The man >>>>>>> mpirun page >>>>>>> > is a bit formidable in the pinning part, so maybe I've missed an >>>>>>> obvious >>>>>>> > answer. >>>>>>> > >>>>>>> > Matt >>>>>>> > -- >>>>>>> > Matt Thompson >>>>>>> > >>>>>>> > Man Among Men >>>>>>> > Fulcrum of History >>>>>>> > >>>>>>> > >>>>>>> > _______________________________________________ >>>>>>> > users mailing list >>>>>>> > us...@open-mpi.org >>>>>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> > Link to this post: >>>>>>> > http://www.open-mpi.org/community/lists/users/2016/01/28217.php >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Erik Schnetter <schnet...@gmail.com> >>>>>>> http://www.perimeterinstitute.ca/personal/eschnetter/ >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2016/01/28218.php >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Kind regards Nick >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2016/01/28219.php >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2016/01/28221.php >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Matt Thompson >>>>> >>>>> Man Among Men >>>>> Fulcrum of History >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2016/01/28223.php >>>>> >>>> >>>> >>>> >>>> -- >>>> Kind regards Nick >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2016/01/28224.php >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2016/01/28226.php >>>> >>> >>> >>> >>> -- >>> Kind regards Nick >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/01/28227.php >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/01/28228.php >>> >> >> >> >> -- >> Kind regards Nick >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/01/28229.php >> > > > > -- > Jeff Hammond > jeff.scie...@gmail.com > http://jeffhammond.github.io/ > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28230.php > -- Kind regards Nick