Well, FWIW, it looks like master (and hence 3.0) behave the way you wanted:
$ mpirun -map-by socket --report-bindings --app ./appfile
[rhc001:48492] MCW rank 0:
[BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../../../..]
[rhc001:48492] MCW rank 1:
[../../../../../../../../../../..
Hello Ralph,
I need to support several different apps, each app with entirely different MPI
communication
needs, and each app either single-threaded or multi-threaded. For example, one
app tends
to do very little message passing, and another app does much more message
passing.
And some of our
Well, yes and no. Yes, your cpu loads will balance better across nodes
(balancing across sockets doesn’t do much for you). However, your overall
application performance may be the poorest in that arrangement if your app uses
a lot of communication as the layout minimizes the use of shared memory
Hello Ralph,
Thank you for your comments.
My understanding, from reading Jeff's blog on V1.5 processor affinity, is that
the bindings in
Example 1 balance the load better than the bindings in Example 2.
Therefore I would like to obtain the bindings in Example 1, but using Open MPI
2.1.1, and