2015-11-05 18:51 GMT+01:00 Dave Love <d.l...@liverpool.ac.uk>:

> Nick Papior <nickpap...@gmail.com> writes:
>
> > This is what I do to successfully get the best performance for my
> > application using OpenMP and OpenMPI:
> >
> > (note this is for 8 cores per socket)
> >
> > mpirun -x OMP_PROC_BIND=true --report-bindings -x OMP_NUM_THREADS=8
> > --map-by ppr:1:socket:pe=8 <exec>
> >
> > It assigns 8 cores per MPI processor separated by sockets, each MPI
> > processor gets 8 threads and I bind the OpenMP threads for affinity. This
> > is for OpenMPI >= 1.8.
>
> I assume there's a good reason, since it's the default, but what makes
> binding to sockets better than to a lower level (if there is a relevant
> lower level)?  The latency and bandwidth may be significantly different
> between different localities on the same socket.
>
Sure, I guess you should use numa to check how the latency/distance is for
the other processors to not select a _bad_ node?
I am not sure.
I can see difficulties in my above post for huge numa nodes, but for 8-10
cores per socket it is pretty good. But it is easy to use ;)

> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/11/28005.php
>



-- 
Kind regards Nick

Reply via email to