Nick Papior <nickpap...@gmail.com> writes: > This is what I do to successfully get the best performance for my > application using OpenMP and OpenMPI: > > (note this is for 8 cores per socket) > > mpirun -x OMP_PROC_BIND=true --report-bindings -x OMP_NUM_THREADS=8 > --map-by ppr:1:socket:pe=8 <exec> > > It assigns 8 cores per MPI processor separated by sockets, each MPI > processor gets 8 threads and I bind the OpenMP threads for affinity. This > is for OpenMPI >= 1.8.
I assume there's a good reason, since it's the default, but what makes binding to sockets better than to a lower level (if there is a relevant lower level)? The latency and bandwidth may be significantly different between different localities on the same socket.