A ha! The Gurus know all. The map-by was the magic sauce:

(1176) $ env OMP_NUM_THREADS=7 KMP_AFFINITY=compact mpirun -np 4 -map-by
ppr:2:socket:pe=7 ./hello-hybrid.x | sort -g -k 18
Hello from thread 0 out of 7 from process 0 out of 4 on borgo035 on CPU 0
Hello from thread 1 out of 7 from process 0 out of 4 on borgo035 on CPU 1
Hello from thread 2 out of 7 from process 0 out of 4 on borgo035 on CPU 2
Hello from thread 3 out of 7 from process 0 out of 4 on borgo035 on CPU 3
Hello from thread 4 out of 7 from process 0 out of 4 on borgo035 on CPU 4
Hello from thread 5 out of 7 from process 0 out of 4 on borgo035 on CPU 5
Hello from thread 6 out of 7 from process 0 out of 4 on borgo035 on CPU 6
Hello from thread 0 out of 7 from process 1 out of 4 on borgo035 on CPU 7
Hello from thread 1 out of 7 from process 1 out of 4 on borgo035 on CPU 8
Hello from thread 2 out of 7 from process 1 out of 4 on borgo035 on CPU 9
Hello from thread 3 out of 7 from process 1 out of 4 on borgo035 on CPU 10
Hello from thread 4 out of 7 from process 1 out of 4 on borgo035 on CPU 11
Hello from thread 5 out of 7 from process 1 out of 4 on borgo035 on CPU 12
Hello from thread 6 out of 7 from process 1 out of 4 on borgo035 on CPU 13
Hello from thread 0 out of 7 from process 2 out of 4 on borgo035 on CPU 14
Hello from thread 1 out of 7 from process 2 out of 4 on borgo035 on CPU 15
Hello from thread 2 out of 7 from process 2 out of 4 on borgo035 on CPU 16
Hello from thread 3 out of 7 from process 2 out of 4 on borgo035 on CPU 17
Hello from thread 4 out of 7 from process 2 out of 4 on borgo035 on CPU 18
Hello from thread 5 out of 7 from process 2 out of 4 on borgo035 on CPU 19
Hello from thread 6 out of 7 from process 2 out of 4 on borgo035 on CPU 20
Hello from thread 0 out of 7 from process 3 out of 4 on borgo035 on CPU 21
Hello from thread 1 out of 7 from process 3 out of 4 on borgo035 on CPU 22
Hello from thread 2 out of 7 from process 3 out of 4 on borgo035 on CPU 23
Hello from thread 3 out of 7 from process 3 out of 4 on borgo035 on CPU 24
Hello from thread 4 out of 7 from process 3 out of 4 on borgo035 on CPU 25
Hello from thread 5 out of 7 from process 3 out of 4 on borgo035 on CPU 26
Hello from thread 6 out of 7 from process 3 out of 4 on borgo035 on CPU 27

So, a question: what does "ppr" mean? The man page seems to accept it as an
axiom of Open MPI:

       --map-by <foo>
              Map  to  the specified object, defaults to socket. Supported
options include slot, hwthread, core, L1cache, L2cache, L3cache, socket,
numa,
              board, node, sequential, distance, and ppr. Any object can
include modifiers by adding a : and any combination of PE=n (bind  n
 processing
              elements  to  each  proc), SPAN (load balance the processes
across the allocation), OVERSUBSCRIBE (allow more processes on a node than
pro‐
              cessing elements), and NOOVERSUBSCRIBE.  This includes PPR,
where the pattern would be terminated by another colon to separate it from
 the
              modifiers.

Is it an acronym/initialism? From some experimenting it seems to be
ppr:2:socket means 2 processes per socket? And pe=7 means leave 7 processes
between them? Is that about right?

Matt

On Wed, Jan 6, 2016 at 3:19 PM, Ralph Castain <r...@open-mpi.org> wrote:

> I believe he wants two procs/socket, so you’d need ppr:2:socket:pe=7
>
>
> On Jan 6, 2016, at 12:14 PM, Nick Papior <nickpap...@gmail.com> wrote:
>
> I do not think KMP_AFFINITY should affect anything in OpenMPI, it is an
> MKL env setting? Or am I wrong?
>
> Note that these are used in an environment where openmpi automatically
> gets the host-file. Hence they are not present.
> With intel mkl and openmpi I got the best performance using these, rather
> long flags:
>
> export KMP_AFFINITY=verbose,compact,granularity=core
> export KMP_STACKSIZE=62M
> export KMP_SETTINGS=1
>
> def_flags="--bind-to core -x OMP_PROC_BIND=true --report-bindings"
> def_flags="$def_flags -x KMP_AFFINITY=$KMP_AFFINITY"
>
> # in your case 7:
> ONP=7
> flags="$def_flags -x MKL_NUM_THREADS=$ONP -x MKL_DYNAMIC=FALSE"
> flags="$flags -x OMP_NUM_THREADS=$ONP -x OMP_DYNAMIC=FALSE"
> flags="$flags -x KMP_STACKSIZE=$KMP_STACKSIZE"
> flags="$flags --map-by ppr:1:socket:pe=7"
>
> then run your program:
>
> mpirun $flags <app>
>
> A lot of the option flags are duplicated (and strictly not needed), but I
> provide them for easy testing changes.
> Surely this is application dependent, but for my case it was performing
> really well.
>
>
> 2016-01-06 20:48 GMT+01:00 Erik Schnetter <schnet...@gmail.com>:
>
>> Setting KMP_AFFINITY will probably override anything that OpenMPI
>> sets. Can you try without?
>>
>> -erik
>>
>> On Wed, Jan 6, 2016 at 2:46 PM, Matt Thompson <fort...@gmail.com> wrote:
>> > Hello Open MPI Gurus,
>> >
>> > As I explore MPI-OpenMP hybrid codes, I'm trying to figure out how to do
>> > things to get the same behavior in various stacks. For example, I have a
>> > 28-core node (2 14-core Haswells), and I'd like to run 4 MPI processes
>> and 7
>> > OpenMP threads. Thus, I'd like the processes to be 2 processes per
>> socket
>> > with the OpenMP threads laid out on them. Using a "hybrid Hello World"
>> > program, I can achieve this with Intel MPI (after a lot of testing):
>> >
>> > (1097) $ env OMP_NUM_THREADS=7 KMP_AFFINITY=compact mpirun -np 4
>> > ./hello-hybrid.x | sort -g -k 18
>> > srun.slurm: cluster configuration lacks support for cpu binding
>> > Hello from thread 0 out of 7 from process 2 out of 4 on borgo035 on CPU
>> 0
>> > Hello from thread 1 out of 7 from process 2 out of 4 on borgo035 on CPU
>> 1
>> > Hello from thread 2 out of 7 from process 2 out of 4 on borgo035 on CPU
>> 2
>> > Hello from thread 3 out of 7 from process 2 out of 4 on borgo035 on CPU
>> 3
>> > Hello from thread 4 out of 7 from process 2 out of 4 on borgo035 on CPU
>> 4
>> > Hello from thread 5 out of 7 from process 2 out of 4 on borgo035 on CPU
>> 5
>> > Hello from thread 6 out of 7 from process 2 out of 4 on borgo035 on CPU
>> 6
>> > Hello from thread 0 out of 7 from process 3 out of 4 on borgo035 on CPU
>> 7
>> > Hello from thread 1 out of 7 from process 3 out of 4 on borgo035 on CPU
>> 8
>> > Hello from thread 2 out of 7 from process 3 out of 4 on borgo035 on CPU
>> 9
>> > Hello from thread 3 out of 7 from process 3 out of 4 on borgo035 on CPU
>> 10
>> > Hello from thread 4 out of 7 from process 3 out of 4 on borgo035 on CPU
>> 11
>> > Hello from thread 5 out of 7 from process 3 out of 4 on borgo035 on CPU
>> 12
>> > Hello from thread 6 out of 7 from process 3 out of 4 on borgo035 on CPU
>> 13
>> > Hello from thread 0 out of 7 from process 0 out of 4 on borgo035 on CPU
>> 14
>> > Hello from thread 1 out of 7 from process 0 out of 4 on borgo035 on CPU
>> 15
>> > Hello from thread 2 out of 7 from process 0 out of 4 on borgo035 on CPU
>> 16
>> > Hello from thread 3 out of 7 from process 0 out of 4 on borgo035 on CPU
>> 17
>> > Hello from thread 4 out of 7 from process 0 out of 4 on borgo035 on CPU
>> 18
>> > Hello from thread 5 out of 7 from process 0 out of 4 on borgo035 on CPU
>> 19
>> > Hello from thread 6 out of 7 from process 0 out of 4 on borgo035 on CPU
>> 20
>> > Hello from thread 0 out of 7 from process 1 out of 4 on borgo035 on CPU
>> 21
>> > Hello from thread 1 out of 7 from process 1 out of 4 on borgo035 on CPU
>> 22
>> > Hello from thread 2 out of 7 from process 1 out of 4 on borgo035 on CPU
>> 23
>> > Hello from thread 3 out of 7 from process 1 out of 4 on borgo035 on CPU
>> 24
>> > Hello from thread 4 out of 7 from process 1 out of 4 on borgo035 on CPU
>> 25
>> > Hello from thread 5 out of 7 from process 1 out of 4 on borgo035 on CPU
>> 26
>> > Hello from thread 6 out of 7 from process 1 out of 4 on borgo035 on CPU
>> 27
>> >
>> > Other than the odd fact that Process #0 seemed to start on Socket #1
>> (this
>> > might be an artifact of how I'm trying to detect the CPU I'm on), this
>> looks
>> > reasonable. 14 threads on each socket and each process is laying out its
>> > threads in a nice orderly fashion.
>> >
>> > I'm trying to figure out how to do this with Open MPI (version 1.10.0)
>> and
>> > apparently I am just not quite good enough to figure it out. The closest
>> > I've gotten is:
>> >
>> > (1155) $ env OMP_NUM_THREADS=7 KMP_AFFINITY=compact mpirun -np 4 -map-by
>> > ppr:2:socket ./hello-hybrid.x | sort -g -k 18
>> > Hello from thread 0 out of 7 from process 0 out of 4 on borgo035 on CPU
>> 0
>> > Hello from thread 0 out of 7 from process 1 out of 4 on borgo035 on CPU
>> 0
>> > Hello from thread 1 out of 7 from process 0 out of 4 on borgo035 on CPU
>> 1
>> > Hello from thread 1 out of 7 from process 1 out of 4 on borgo035 on CPU
>> 1
>> > Hello from thread 2 out of 7 from process 0 out of 4 on borgo035 on CPU
>> 2
>> > Hello from thread 2 out of 7 from process 1 out of 4 on borgo035 on CPU
>> 2
>> > Hello from thread 3 out of 7 from process 0 out of 4 on borgo035 on CPU
>> 3
>> > Hello from thread 3 out of 7 from process 1 out of 4 on borgo035 on CPU
>> 3
>> > Hello from thread 4 out of 7 from process 0 out of 4 on borgo035 on CPU
>> 4
>> > Hello from thread 4 out of 7 from process 1 out of 4 on borgo035 on CPU
>> 4
>> > Hello from thread 5 out of 7 from process 0 out of 4 on borgo035 on CPU
>> 5
>> > Hello from thread 5 out of 7 from process 1 out of 4 on borgo035 on CPU
>> 5
>> > Hello from thread 6 out of 7 from process 0 out of 4 on borgo035 on CPU
>> 6
>> > Hello from thread 6 out of 7 from process 1 out of 4 on borgo035 on CPU
>> 6
>> > Hello from thread 0 out of 7 from process 2 out of 4 on borgo035 on CPU
>> 14
>> > Hello from thread 0 out of 7 from process 3 out of 4 on borgo035 on CPU
>> 14
>> > Hello from thread 1 out of 7 from process 2 out of 4 on borgo035 on CPU
>> 15
>> > Hello from thread 1 out of 7 from process 3 out of 4 on borgo035 on CPU
>> 15
>> > Hello from thread 2 out of 7 from process 2 out of 4 on borgo035 on CPU
>> 16
>> > Hello from thread 2 out of 7 from process 3 out of 4 on borgo035 on CPU
>> 16
>> > Hello from thread 3 out of 7 from process 2 out of 4 on borgo035 on CPU
>> 17
>> > Hello from thread 3 out of 7 from process 3 out of 4 on borgo035 on CPU
>> 17
>> > Hello from thread 4 out of 7 from process 2 out of 4 on borgo035 on CPU
>> 18
>> > Hello from thread 4 out of 7 from process 3 out of 4 on borgo035 on CPU
>> 18
>> > Hello from thread 5 out of 7 from process 2 out of 4 on borgo035 on CPU
>> 19
>> > Hello from thread 5 out of 7 from process 3 out of 4 on borgo035 on CPU
>> 19
>> > Hello from thread 6 out of 7 from process 2 out of 4 on borgo035 on CPU
>> 20
>> > Hello from thread 6 out of 7 from process 3 out of 4 on borgo035 on CPU
>> 20
>> >
>> > Obviously not right. Any ideas on how to help me learn? The man mpirun
>> page
>> > is a bit formidable in the pinning part, so maybe I've missed an obvious
>> > answer.
>> >
>> > Matt
>> > --
>> > Matt Thompson
>> >
>> > Man Among Men
>> > Fulcrum of History
>> >
>> >
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > Link to this post:
>> > http://www.open-mpi.org/community/lists/users/2016/01/28217.php
>>
>>
>>
>> --
>> Erik Schnetter <schnet...@gmail.com>
>> http://www.perimeterinstitute.ca/personal/eschnetter/
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/01/28218.php
>>
>
>
>
> --
> Kind regards Nick
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/01/28219.php
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/01/28221.php
>



-- 
Matt Thompson

Man Among Men
Fulcrum of History

Reply via email to