Nick Papior <nickpap...@gmail.com> writes:

> This is what I do to successfully get the best performance for my
> application using OpenMP and OpenMPI:
>
> (note this is for 8 cores per socket)
>
> mpirun -x OMP_PROC_BIND=true --report-bindings -x OMP_NUM_THREADS=8
> --map-by ppr:1:socket:pe=8 <exec>
>
> It assigns 8 cores per MPI processor separated by sockets, each MPI
> processor gets 8 threads and I bind the OpenMP threads for affinity. This
> is for OpenMPI >= 1.8.

I assume there's a good reason, since it's the default, but what makes
binding to sockets better than to a lower level (if there is a relevant
lower level)?  The latency and bandwidth may be significantly different
between different localities on the same socket.

Reply via email to