On Sep 5, 2014, at 11:49 PM, Ralph Castain <r...@open-mpi.org> wrote:
> It would be about the worst thing you can do, to be honest. Reason is that > each socket is typically a separate NUMA region, and so the shared memory > system would be sub-optimized in that configuration. It would be much better > to map-by core to avoid the NUMA issues. +1 Also, per the pictures I posted, perhaps in your stress testing you're trying to add more network traffic, but in general, most apps benefit from shared memory communication, not network communication. Regardless of your network, shared memory communication is almost always faster. So for real jobs, you should a) consider mapping by core, especially if your individual MPI processes are single-threaded, and b) smush as many of them together on as few servers as possible in order to maximize shared memory communication and minimize network communication. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/