I do not think KMP_AFFINITY should affect anything in OpenMPI, it is an MKL
env setting? Or am I wrong?

Note that these are used in an environment where openmpi automatically gets
the host-file. Hence they are not present.
With intel mkl and openmpi I got the best performance using these, rather
long flags:

export KMP_AFFINITY=verbose,compact,granularity=core
export KMP_STACKSIZE=62M
export KMP_SETTINGS=1

def_flags="--bind-to core -x OMP_PROC_BIND=true --report-bindings"
def_flags="$def_flags -x KMP_AFFINITY=$KMP_AFFINITY"

# in your case 7:
ONP=7
flags="$def_flags -x MKL_NUM_THREADS=$ONP -x MKL_DYNAMIC=FALSE"
flags="$flags -x OMP_NUM_THREADS=$ONP -x OMP_DYNAMIC=FALSE"
flags="$flags -x KMP_STACKSIZE=$KMP_STACKSIZE"
flags="$flags --map-by ppr:1:socket:pe=7"

then run your program:

mpirun $flags <app>

A lot of the option flags are duplicated (and strictly not needed), but I
provide them for easy testing changes.
Surely this is application dependent, but for my case it was performing
really well.


2016-01-06 20:48 GMT+01:00 Erik Schnetter <schnet...@gmail.com>:

> Setting KMP_AFFINITY will probably override anything that OpenMPI
> sets. Can you try without?
>
> -erik
>
> On Wed, Jan 6, 2016 at 2:46 PM, Matt Thompson <fort...@gmail.com> wrote:
> > Hello Open MPI Gurus,
> >
> > As I explore MPI-OpenMP hybrid codes, I'm trying to figure out how to do
> > things to get the same behavior in various stacks. For example, I have a
> > 28-core node (2 14-core Haswells), and I'd like to run 4 MPI processes
> and 7
> > OpenMP threads. Thus, I'd like the processes to be 2 processes per socket
> > with the OpenMP threads laid out on them. Using a "hybrid Hello World"
> > program, I can achieve this with Intel MPI (after a lot of testing):
> >
> > (1097) $ env OMP_NUM_THREADS=7 KMP_AFFINITY=compact mpirun -np 4
> > ./hello-hybrid.x | sort -g -k 18
> > srun.slurm: cluster configuration lacks support for cpu binding
> > Hello from thread 0 out of 7 from process 2 out of 4 on borgo035 on CPU 0
> > Hello from thread 1 out of 7 from process 2 out of 4 on borgo035 on CPU 1
> > Hello from thread 2 out of 7 from process 2 out of 4 on borgo035 on CPU 2
> > Hello from thread 3 out of 7 from process 2 out of 4 on borgo035 on CPU 3
> > Hello from thread 4 out of 7 from process 2 out of 4 on borgo035 on CPU 4
> > Hello from thread 5 out of 7 from process 2 out of 4 on borgo035 on CPU 5
> > Hello from thread 6 out of 7 from process 2 out of 4 on borgo035 on CPU 6
> > Hello from thread 0 out of 7 from process 3 out of 4 on borgo035 on CPU 7
> > Hello from thread 1 out of 7 from process 3 out of 4 on borgo035 on CPU 8
> > Hello from thread 2 out of 7 from process 3 out of 4 on borgo035 on CPU 9
> > Hello from thread 3 out of 7 from process 3 out of 4 on borgo035 on CPU
> 10
> > Hello from thread 4 out of 7 from process 3 out of 4 on borgo035 on CPU
> 11
> > Hello from thread 5 out of 7 from process 3 out of 4 on borgo035 on CPU
> 12
> > Hello from thread 6 out of 7 from process 3 out of 4 on borgo035 on CPU
> 13
> > Hello from thread 0 out of 7 from process 0 out of 4 on borgo035 on CPU
> 14
> > Hello from thread 1 out of 7 from process 0 out of 4 on borgo035 on CPU
> 15
> > Hello from thread 2 out of 7 from process 0 out of 4 on borgo035 on CPU
> 16
> > Hello from thread 3 out of 7 from process 0 out of 4 on borgo035 on CPU
> 17
> > Hello from thread 4 out of 7 from process 0 out of 4 on borgo035 on CPU
> 18
> > Hello from thread 5 out of 7 from process 0 out of 4 on borgo035 on CPU
> 19
> > Hello from thread 6 out of 7 from process 0 out of 4 on borgo035 on CPU
> 20
> > Hello from thread 0 out of 7 from process 1 out of 4 on borgo035 on CPU
> 21
> > Hello from thread 1 out of 7 from process 1 out of 4 on borgo035 on CPU
> 22
> > Hello from thread 2 out of 7 from process 1 out of 4 on borgo035 on CPU
> 23
> > Hello from thread 3 out of 7 from process 1 out of 4 on borgo035 on CPU
> 24
> > Hello from thread 4 out of 7 from process 1 out of 4 on borgo035 on CPU
> 25
> > Hello from thread 5 out of 7 from process 1 out of 4 on borgo035 on CPU
> 26
> > Hello from thread 6 out of 7 from process 1 out of 4 on borgo035 on CPU
> 27
> >
> > Other than the odd fact that Process #0 seemed to start on Socket #1
> (this
> > might be an artifact of how I'm trying to detect the CPU I'm on), this
> looks
> > reasonable. 14 threads on each socket and each process is laying out its
> > threads in a nice orderly fashion.
> >
> > I'm trying to figure out how to do this with Open MPI (version 1.10.0)
> and
> > apparently I am just not quite good enough to figure it out. The closest
> > I've gotten is:
> >
> > (1155) $ env OMP_NUM_THREADS=7 KMP_AFFINITY=compact mpirun -np 4 -map-by
> > ppr:2:socket ./hello-hybrid.x | sort -g -k 18
> > Hello from thread 0 out of 7 from process 0 out of 4 on borgo035 on CPU 0
> > Hello from thread 0 out of 7 from process 1 out of 4 on borgo035 on CPU 0
> > Hello from thread 1 out of 7 from process 0 out of 4 on borgo035 on CPU 1
> > Hello from thread 1 out of 7 from process 1 out of 4 on borgo035 on CPU 1
> > Hello from thread 2 out of 7 from process 0 out of 4 on borgo035 on CPU 2
> > Hello from thread 2 out of 7 from process 1 out of 4 on borgo035 on CPU 2
> > Hello from thread 3 out of 7 from process 0 out of 4 on borgo035 on CPU 3
> > Hello from thread 3 out of 7 from process 1 out of 4 on borgo035 on CPU 3
> > Hello from thread 4 out of 7 from process 0 out of 4 on borgo035 on CPU 4
> > Hello from thread 4 out of 7 from process 1 out of 4 on borgo035 on CPU 4
> > Hello from thread 5 out of 7 from process 0 out of 4 on borgo035 on CPU 5
> > Hello from thread 5 out of 7 from process 1 out of 4 on borgo035 on CPU 5
> > Hello from thread 6 out of 7 from process 0 out of 4 on borgo035 on CPU 6
> > Hello from thread 6 out of 7 from process 1 out of 4 on borgo035 on CPU 6
> > Hello from thread 0 out of 7 from process 2 out of 4 on borgo035 on CPU
> 14
> > Hello from thread 0 out of 7 from process 3 out of 4 on borgo035 on CPU
> 14
> > Hello from thread 1 out of 7 from process 2 out of 4 on borgo035 on CPU
> 15
> > Hello from thread 1 out of 7 from process 3 out of 4 on borgo035 on CPU
> 15
> > Hello from thread 2 out of 7 from process 2 out of 4 on borgo035 on CPU
> 16
> > Hello from thread 2 out of 7 from process 3 out of 4 on borgo035 on CPU
> 16
> > Hello from thread 3 out of 7 from process 2 out of 4 on borgo035 on CPU
> 17
> > Hello from thread 3 out of 7 from process 3 out of 4 on borgo035 on CPU
> 17
> > Hello from thread 4 out of 7 from process 2 out of 4 on borgo035 on CPU
> 18
> > Hello from thread 4 out of 7 from process 3 out of 4 on borgo035 on CPU
> 18
> > Hello from thread 5 out of 7 from process 2 out of 4 on borgo035 on CPU
> 19
> > Hello from thread 5 out of 7 from process 3 out of 4 on borgo035 on CPU
> 19
> > Hello from thread 6 out of 7 from process 2 out of 4 on borgo035 on CPU
> 20
> > Hello from thread 6 out of 7 from process 3 out of 4 on borgo035 on CPU
> 20
> >
> > Obviously not right. Any ideas on how to help me learn? The man mpirun
> page
> > is a bit formidable in the pinning part, so maybe I've missed an obvious
> > answer.
> >
> > Matt
> > --
> > Matt Thompson
> >
> > Man Among Men
> > Fulcrum of History
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2016/01/28217.php
>
>
>
> --
> Erik Schnetter <schnet...@gmail.com>
> http://www.perimeterinstitute.ca/personal/eschnetter/
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/01/28218.php
>



-- 
Kind regards Nick

Reply via email to