Hello Ralph,
I do understand that "slot" is an abstract term and isn't tied down to
any particular piece of hardware. What I am trying to understand is how
"slot" came to be equivalent to "socket" in my second and third example,
but "core" in my first example. As far as I can tell, MPI ranks should
have been assigned the same in all three examples. Why weren't they?
You mentioned that, when using "--rank-by slot", the ranks are assigned
round-robin by scheduler entry; does this mean that the scheduler
entries change based on the mapping algorithm (the only thing I changed
in my examples) and this results in ranks being assigned differently?
Thanks again,
David
On 11/30/2016 01:23 PM, r...@open-mpi.org wrote:
I think you have confused “slot” with a physical “core”. The two have
absolutely nothing to do with each other.
A “slot” is nothing more than a scheduling entry in which a process can be
placed. So when you --rank-by slot, the ranks are assigned round-robin by
scheduler entry - i.e., you assign all the ranks on the first node, then assign
all the ranks on the next node, etc.
It doesn’t matter where those ranks are placed, or what core or socket they are
running on. We just blindly go thru and assign numbers.
If you rank-by core, then we cycle across the procs by looking at the core
number they are bound to, assigning all the procs on a node before moving to
the next node. If you rank-by socket, then you cycle across the procs on a node
by round-robin of sockets, assigning all procs on the node before moving to the
next node. If you then added “span” to that directive, we’d round-robin by
socket across all nodes before circling around to the next proc on this node.
HTH
Ralph
On Nov 30, 2016, at 11:26 AM, David Shrader <dshra...@lanl.gov> wrote:
Hello All,
The man page for mpirun says that the default ranking procedure is round-robin
by slot. It doesn't seem to be that straight-forward to me, though, and I
wanted to ask about the behavior.
To help illustrate my confusion, here are a few examples where the ranking
behavior changed based on the mapping behavior, which doesn't make sense to me,
yet. First, here is a simple map by core (using 4 nodes of 32 cpu cores each):
$> mpirun -n 128 --map-by core --report-bindings true
[gr0649.localdomain:119614] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
[B/././././././././././././././././.][./././././././././././././././././.]
[gr0649.localdomain:119614] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
[./B/./././././././././././././././.][./././././././././././././././././.]
[gr0649.localdomain:119614] MCW rank 2 bound to socket 0[core 2[hwt 0]]:
[././B/././././././././././././././.][./././././././././././././././././.]
...output snipped...
Things look as I would expect: ranking happens round-robin through the cpu
cores. Now, here's a map by socket example:
$> mpirun -n 128 --map-by socket --report-bindings true
[gr0649.localdomain:119926] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
[B/././././././././././././././././.][./././././././././././././././././.]
[gr0649.localdomain:119926] MCW rank 1 bound to socket 1[core 18[hwt 0]]:
[./././././././././././././././././.][B/././././././././././././././././.]
[gr0649.localdomain:119926] MCW rank 2 bound to socket 0[core 1[hwt 0]]:
[./B/./././././././././././././././.][./././././././././././././././././.]
...output snipped...
Why is rank 1 on a different socket? I know I am mapping by socket in this example, but,
fundamentally, nothing should really be different in terms of ranking, correct? The same number of
processes are available on each host as in the first example, and available in the same locations.
How is "slot" different in this case? If I use "--rank-by core," I recover the
output from the first example.
I thought that maybe "--rank-by slot" might be following something laid down by
"--map-by", but the following example shows that isn't completely correct, either:
$> mpirun -n 128 --map-by socket:span --report-bindings true
[gr0649.localdomain:119319] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
[B/././././././././././././././././.][./././././././././././././././././.]
[gr0649.localdomain:119319] MCW rank 1 bound to socket 1[core 18[hwt 0]]:
[./././././././././././././././././.][B/././././././././././././././././.]
[gr0649.localdomain:119319] MCW rank 2 bound to socket 0[core 1[hwt 0]]:
[./B/./././././././././././././././.][./././././././././././././././././.]
...output snipped...
If ranking by slot were somehow following something left over by mapping, I would have
expected rank 2 to end up on a different host. So, now I don't know what to expect from
using "--rank-by slot." Does anyone have any pointers?
Thank you for the help!
David
--
David Shrader
HPC-ENV High Performance Computer Systems
Los Alamos National Lab
Email: dshrader <at> lanl.gov
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
David Shrader
HPC-ENV High Performance Computer Systems
Los Alamos National Lab
Email: dshrader <at> lanl.gov
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users