On Sep 5, 2014, at 11:49 PM, Ralph Castain <r...@open-mpi.org> wrote:

> It would be about the worst thing you can do, to be honest. Reason is that 
> each socket is typically a separate NUMA region, and so the shared memory 
> system would be sub-optimized in that configuration. It would be much better 
> to map-by core to avoid the NUMA issues.

+1

Also, per the pictures I posted, perhaps in your stress testing you're trying 
to add more network traffic, but in general, most apps benefit from shared 
memory communication, not network communication.  Regardless of your network, 
shared memory communication is almost always faster.  So for real jobs, you 
should a) consider mapping by core, especially if your individual MPI processes 
are single-threaded, and b) smush as many of them together on as few servers as 
possible in order to maximize shared memory communication and minimize network 
communication.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to