2015-11-05 18:51 GMT+01:00 Dave Love <d.l...@liverpool.ac.uk>: > Nick Papior <nickpap...@gmail.com> writes: > > > This is what I do to successfully get the best performance for my > > application using OpenMP and OpenMPI: > > > > (note this is for 8 cores per socket) > > > > mpirun -x OMP_PROC_BIND=true --report-bindings -x OMP_NUM_THREADS=8 > > --map-by ppr:1:socket:pe=8 <exec> > > > > It assigns 8 cores per MPI processor separated by sockets, each MPI > > processor gets 8 threads and I bind the OpenMP threads for affinity. This > > is for OpenMPI >= 1.8. > > I assume there's a good reason, since it's the default, but what makes > binding to sockets better than to a lower level (if there is a relevant > lower level)? The latency and bandwidth may be significantly different > between different localities on the same socket. > Sure, I guess you should use numa to check how the latency/distance is for the other processors to not select a _bad_ node? I am not sure. I can see difficulties in my above post for huge numa nodes, but for 8-10 cores per socket it is pretty good. But it is easy to use ;)
> _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/11/28005.php > -- Kind regards Nick