[OMPI users] question about "--rank-by slot" behavior

2016-11-30 Thread David Shrader

Hello All,

The man page for mpirun says that the default ranking procedure is 
round-robin by slot. It doesn't seem to be that straight-forward to me, 
though, and I wanted to ask about the behavior.


To help illustrate my confusion, here are a few examples where the 
ranking behavior changed based on the mapping behavior, which doesn't 
make sense to me, yet. First, here is a simple map by core (using 4 
nodes of 32 cpu cores each):


$> mpirun -n 128 --map-by core --report-bindings true
[gr0649.localdomain:119614] MCW rank 0 bound to socket 0[core 0[hwt 0]]: 
[B/././././././././././././././././.][./././././././././././././././././.]
[gr0649.localdomain:119614] MCW rank 1 bound to socket 0[core 1[hwt 0]]: 
[./B/./././././././././././././././.][./././././././././././././././././.]
[gr0649.localdomain:119614] MCW rank 2 bound to socket 0[core 2[hwt 0]]: 
[././B/././././././././././././././.][./././././././././././././././././.]

...output snipped...

Things look as I would expect: ranking happens round-robin through the 
cpu cores. Now, here's a map by socket example:


$> mpirun -n 128 --map-by socket --report-bindings true
[gr0649.localdomain:119926] MCW rank 0 bound to socket 0[core 0[hwt 0]]: 
[B/././././././././././././././././.][./././././././././././././././././.]
[gr0649.localdomain:119926] MCW rank 1 bound to socket 1[core 18[hwt 
0]]: 
[./././././././././././././././././.][B/././././././././././././././././.]
[gr0649.localdomain:119926] MCW rank 2 bound to socket 0[core 1[hwt 0]]: 
[./B/./././././././././././././././.][./././././././././././././././././.]

...output snipped...

Why is rank 1 on a different socket? I know I am mapping by socket in 
this example, but, fundamentally, nothing should really be different in 
terms of ranking, correct? The same number of processes are available on 
each host as in the first example, and available in the same locations. 
How is "slot" different in this case? If I use "--rank-by core," I 
recover the output from the first example.


I thought that maybe "--rank-by slot" might be following something laid 
down by "--map-by", but the following example shows that isn't 
completely correct, either:


$> mpirun -n 128 --map-by socket:span --report-bindings true
[gr0649.localdomain:119319] MCW rank 0 bound to socket 0[core 0[hwt 0]]: 
[B/././././././././././././././././.][./././././././././././././././././.]
[gr0649.localdomain:119319] MCW rank 1 bound to socket 1[core 18[hwt 
0]]: 
[./././././././././././././././././.][B/././././././././././././././././.]
[gr0649.localdomain:119319] MCW rank 2 bound to socket 0[core 1[hwt 0]]: 
[./B/./././././././././././././././.][./././././././././././././././././.]

...output snipped...

If ranking by slot were somehow following something left over by 
mapping, I would have expected rank 2 to end up on a different host. So, 
now I don't know what to expect from using "--rank-by slot." Does anyone 
have any pointers?


Thank you for the help!
David

--
David Shrader
HPC-ENV High Performance Computer Systems
Los Alamos National Lab
Email: dshrader  lanl.gov

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Issues building Open MPI 2.0.1 with PGI 16.10 on macOS

2016-11-30 Thread Matt Thompson
Well, jtull over at PGI seemed to have the "magic sauce":

http://www.pgroup.com/userforum/viewtopic.php?p=21105#21105

Namely, I think it's the siterc file. I'm not sure which of the adaptations
fixes the issue yet, though.

On Mon, Nov 28, 2016 at 3:11 PM, Jeff Hammond 
wrote:

> attached config.log that contains the details of the following failures is
> the best way to make forward-progress here.  that none of the system
> headers are detected suggests a rather serious compiler problem that may
> not have anything to do with headers.
>
> checking for sys/types.h... no
> checking for sys/stat.h... no
> checking for stdlib.h... no
> checking for string.h... no
> checking for memory.h... no
> checking for strings.h... no
> checking for inttypes.h... no
> checking for stdint.h... no
> checking for unistd.h... no
>
>
> On Mon, Nov 28, 2016 at 9:49 AM, Matt Thompson  wrote:
>
>> Hmm. Well, I definitely have /usr/include/stdint.h as I previously was
>> trying work with clang as compiler stack. And as near as I can tell, Open
>> MPI's configure is seeing /usr/include as oldincludedir, but maybe that's
>> not how it finds it?
>>
>> If I check my configure output:
>>
>> 
>> 
>> == Configuring Open MPI
>> 
>> 
>>
>> *** Startup tests
>> checking build system type... x86_64-apple-darwin15.6.0
>> 
>> checking for sys/types.h... yes
>> checking for sys/stat.h... yes
>> checking for stdlib.h... yes
>> checking for string.h... yes
>> checking for memory.h... yes
>> checking for strings.h... yes
>> checking for inttypes.h... yes
>> checking for stdint.h... yes
>> checking for unistd.h... yes
>>
>> So, the startup saw it. But:
>>
>> --- MCA component event:libevent2022 (m4 configuration macro, priority 80)
>> checking for MCA component event:libevent2022 compile mode... static
>> checking libevent configuration args... --disable-dns --disable-http
>> --disable-rpc --disable-openssl --enable-thread-support --d
>> isable-evport
>> configure: OPAL configuring in opal/mca/event/libevent2022/libevent
>> configure: running /bin/sh './configure' --disable-dns --disable-http
>> --disable-rpc --disable-openssl --enable-thread-support --
>> disable-evport  '--disable-wrapper-rpath' 'CC=pgcc' 'CXX=pgc++'
>> 'FC=pgfortran' 'CFLAGS=-m64' 'CXXFLAGS=-m64' 'FCFLAGS=-m64' '--w
>> ithout-verbs' '--prefix=/Users/mathomp4/inst
>> alled/Compiler/pgi-16.10/openmpi/2.0.1' 'CPPFLAGS=-I/Users/mathomp4/sr
>> c/MPI/openmpi-
>> 2.0.1 -I/Users/mathomp4/src/MPI/openmpi-2.0.1
>> -I/Users/mathomp4/src/MPI/openmpi-2.0.1/opal/include
>> -I/Users/mathomp4/src/MPI/o
>> penmpi-2.0.1/opal/mca/hwloc/hwloc1112/hwloc/include
>> -Drandom=opal_random' --cache-file=/dev/null --srcdir=. --disable-option-che
>> cking
>> checking for a BSD-compatible install... /usr/bin/install -c
>> 
>> checking for sys/types.h... no
>> checking for sys/stat.h... no
>> checking for stdlib.h... no
>> checking for string.h... no
>> checking for memory.h... no
>> checking for strings.h... no
>> checking for inttypes.h... no
>> checking for stdint.h... no
>> checking for unistd.h... no
>>
>> So, it's like whatever magic found stdint.h for the startup isn't passed
>> down to libevent when it builds? As I scan the configure output, PMIx sees
>> stdint.h in its section and ROMIO sees it as well, but not libevent2022.
>> The Makefiles inside of libevent2022 do have 'oldincludedir =
>> /usr/include'. Hmm.
>>
>>
>>
>> On Mon, Nov 28, 2016 at 11:39 AM, Bennet Fauber  wrote:
>>
>>> I think PGI uses installed GCC components for some parts of standard C
>>> (at least for some things on Linux, it does; and I imagine it is
>>> similar for Mac).  If you look at the post at
>>>
>>> http://www.pgroup.com/userforum/viewtopic.php?t=5147&sid=17f
>>> 3afa2cd0eec05b0f4e54a60f50479
>>>
>>> The problem seems to have been one with the Xcode configuration:
>>>
>>> "It turns out my Xcode was messed up as I was missing /usr/include/.
>>>  After rerunning xcode-select --install it works now."
>>>
>>> On my OS X 10.11.6, I have /usr/include/stdint.h without having the
>>> PGI compilers.  This may be related to the GNU command line tools
>>> installation...?  I think that is now optional and may be needed.
>>>
>>> Sorry for the noise if this is irrelevant.
>>>
>>>
>>>
>>> On Mon, Nov 28, 2016 at 11:18 AM, Jeff Hammond 
>>> wrote:
>>> > The following is the code that fails.  The comments indicate the likely
>>> > source of the error.
>>> >
>>> > Please see
>>> > http://www.pgroup.com/userforum/viewtopic.php?t=5147&sid=17f
>>> 3afa2cd0eec05b0f4e54a60f50479
>>> > and other entries on https://www.google.com/search?q=pgi+stdint.h.
>>> >
>>> > You may want to debug libevent by itself
>>> > (https://github.com/libevent/libevent).
>>> >
>>> > I do not have PGI installed on my Mac, so I can't reproduce this.
>>> >
>>> > Best,
>>> >
>>> > Jeff
>>> >
>>> > /**
>>> >
>

Re: [OMPI users] question about "--rank-by slot" behavior

2016-11-30 Thread r...@open-mpi.org
I think you have confused “slot” with a physical “core”. The two have 
absolutely nothing to do with each other.

A “slot” is nothing more than a scheduling entry in which a process can be 
placed. So when you --rank-by slot, the ranks are assigned round-robin by 
scheduler entry - i.e., you assign all the ranks on the first node, then assign 
all the ranks on the next node, etc.

It doesn’t matter where those ranks are placed, or what core or socket they are 
running on. We just blindly go thru and assign numbers.

If you rank-by core, then we cycle across the procs by looking at the core 
number they are bound to, assigning all the procs on a node before moving to 
the next node. If you rank-by socket, then you cycle across the procs on a node 
by round-robin of sockets, assigning all procs on the node before moving to the 
next node. If you then added “span” to that directive, we’d round-robin by 
socket across all nodes before circling around to the next proc on this node.

HTH
Ralph


> On Nov 30, 2016, at 11:26 AM, David Shrader  wrote:
> 
> Hello All,
> 
> The man page for mpirun says that the default ranking procedure is 
> round-robin by slot. It doesn't seem to be that straight-forward to me, 
> though, and I wanted to ask about the behavior.
> 
> To help illustrate my confusion, here are a few examples where the ranking 
> behavior changed based on the mapping behavior, which doesn't make sense to 
> me, yet. First, here is a simple map by core (using 4 nodes of 32 cpu cores 
> each):
> 
> $> mpirun -n 128 --map-by core --report-bindings true
> [gr0649.localdomain:119614] MCW rank 0 bound to socket 0[core 0[hwt 0]]: 
> [B/././././././././././././././././.][./././././././././././././././././.]
> [gr0649.localdomain:119614] MCW rank 1 bound to socket 0[core 1[hwt 0]]: 
> [./B/./././././././././././././././.][./././././././././././././././././.]
> [gr0649.localdomain:119614] MCW rank 2 bound to socket 0[core 2[hwt 0]]: 
> [././B/././././././././././././././.][./././././././././././././././././.]
> ...output snipped...
> 
> Things look as I would expect: ranking happens round-robin through the cpu 
> cores. Now, here's a map by socket example:
> 
> $> mpirun -n 128 --map-by socket --report-bindings true
> [gr0649.localdomain:119926] MCW rank 0 bound to socket 0[core 0[hwt 0]]: 
> [B/././././././././././././././././.][./././././././././././././././././.]
> [gr0649.localdomain:119926] MCW rank 1 bound to socket 1[core 18[hwt 0]]: 
> [./././././././././././././././././.][B/././././././././././././././././.]
> [gr0649.localdomain:119926] MCW rank 2 bound to socket 0[core 1[hwt 0]]: 
> [./B/./././././././././././././././.][./././././././././././././././././.]
> ...output snipped...
> 
> Why is rank 1 on a different socket? I know I am mapping by socket in this 
> example, but, fundamentally, nothing should really be different in terms of 
> ranking, correct? The same number of processes are available on each host as 
> in the first example, and available in the same locations. How is "slot" 
> different in this case? If I use "--rank-by core," I recover the output from 
> the first example.
> 
> I thought that maybe "--rank-by slot" might be following something laid down 
> by "--map-by", but the following example shows that isn't completely correct, 
> either:
> 
> $> mpirun -n 128 --map-by socket:span --report-bindings true
> [gr0649.localdomain:119319] MCW rank 0 bound to socket 0[core 0[hwt 0]]: 
> [B/././././././././././././././././.][./././././././././././././././././.]
> [gr0649.localdomain:119319] MCW rank 1 bound to socket 1[core 18[hwt 0]]: 
> [./././././././././././././././././.][B/././././././././././././././././.]
> [gr0649.localdomain:119319] MCW rank 2 bound to socket 0[core 1[hwt 0]]: 
> [./B/./././././././././././././././.][./././././././././././././././././.]
> ...output snipped...
> 
> If ranking by slot were somehow following something left over by mapping, I 
> would have expected rank 2 to end up on a different host. So, now I don't 
> know what to expect from using "--rank-by slot." Does anyone have any 
> pointers?
> 
> Thank you for the help!
> David
> 
> -- 
> David Shrader
> HPC-ENV High Performance Computer Systems
> Los Alamos National Lab
> Email: dshrader  lanl.gov
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] question about "--rank-by slot" behavior

2016-11-30 Thread David Shrader

Hello Ralph,

I do understand that "slot" is an abstract term and isn't tied down to 
any particular piece of hardware. What I am trying to understand is how 
"slot" came to be equivalent to "socket" in my second and third example, 
but "core" in my first example. As far as I can tell, MPI ranks should 
have been assigned the same in all three examples. Why weren't they?


You mentioned that, when using "--rank-by slot", the ranks are assigned 
round-robin by scheduler entry; does this mean that the scheduler 
entries change based on the mapping algorithm (the only thing I changed 
in my examples) and this results in ranks being assigned differently?


Thanks again,
David

On 11/30/2016 01:23 PM, r...@open-mpi.org wrote:

I think you have confused “slot” with a physical “core”. The two have 
absolutely nothing to do with each other.

A “slot” is nothing more than a scheduling entry in which a process can be 
placed. So when you --rank-by slot, the ranks are assigned round-robin by 
scheduler entry - i.e., you assign all the ranks on the first node, then assign 
all the ranks on the next node, etc.

It doesn’t matter where those ranks are placed, or what core or socket they are 
running on. We just blindly go thru and assign numbers.

If you rank-by core, then we cycle across the procs by looking at the core 
number they are bound to, assigning all the procs on a node before moving to 
the next node. If you rank-by socket, then you cycle across the procs on a node 
by round-robin of sockets, assigning all procs on the node before moving to the 
next node. If you then added “span” to that directive, we’d round-robin by 
socket across all nodes before circling around to the next proc on this node.

HTH
Ralph



On Nov 30, 2016, at 11:26 AM, David Shrader  wrote:

Hello All,

The man page for mpirun says that the default ranking procedure is round-robin 
by slot. It doesn't seem to be that straight-forward to me, though, and I 
wanted to ask about the behavior.

To help illustrate my confusion, here are a few examples where the ranking 
behavior changed based on the mapping behavior, which doesn't make sense to me, 
yet. First, here is a simple map by core (using 4 nodes of 32 cpu cores each):

$> mpirun -n 128 --map-by core --report-bindings true
[gr0649.localdomain:119614] MCW rank 0 bound to socket 0[core 0[hwt 0]]: 
[B/././././././././././././././././.][./././././././././././././././././.]
[gr0649.localdomain:119614] MCW rank 1 bound to socket 0[core 1[hwt 0]]: 
[./B/./././././././././././././././.][./././././././././././././././././.]
[gr0649.localdomain:119614] MCW rank 2 bound to socket 0[core 2[hwt 0]]: 
[././B/././././././././././././././.][./././././././././././././././././.]
...output snipped...

Things look as I would expect: ranking happens round-robin through the cpu 
cores. Now, here's a map by socket example:

$> mpirun -n 128 --map-by socket --report-bindings true
[gr0649.localdomain:119926] MCW rank 0 bound to socket 0[core 0[hwt 0]]: 
[B/././././././././././././././././.][./././././././././././././././././.]
[gr0649.localdomain:119926] MCW rank 1 bound to socket 1[core 18[hwt 0]]: 
[./././././././././././././././././.][B/././././././././././././././././.]
[gr0649.localdomain:119926] MCW rank 2 bound to socket 0[core 1[hwt 0]]: 
[./B/./././././././././././././././.][./././././././././././././././././.]
...output snipped...

Why is rank 1 on a different socket? I know I am mapping by socket in this example, but, 
fundamentally, nothing should really be different in terms of ranking, correct? The same number of 
processes are available on each host as in the first example, and available in the same locations. 
How is "slot" different in this case? If I use "--rank-by core," I recover the 
output from the first example.

I thought that maybe "--rank-by slot" might be following something laid down by 
"--map-by", but the following example shows that isn't completely correct, either:

$> mpirun -n 128 --map-by socket:span --report-bindings true
[gr0649.localdomain:119319] MCW rank 0 bound to socket 0[core 0[hwt 0]]: 
[B/././././././././././././././././.][./././././././././././././././././.]
[gr0649.localdomain:119319] MCW rank 1 bound to socket 1[core 18[hwt 0]]: 
[./././././././././././././././././.][B/././././././././././././././././.]
[gr0649.localdomain:119319] MCW rank 2 bound to socket 0[core 1[hwt 0]]: 
[./B/./././././././././././././././.][./././././././././././././././././.]
...output snipped...

If ranking by slot were somehow following something left over by mapping, I would have 
expected rank 2 to end up on a different host. So, now I don't know what to expect from 
using "--rank-by slot." Does anyone have any pointers?

Thank you for the help!
David

--
David Shrader
HPC-ENV High Performance Computer Systems
Los Alamos National Lab
Email: dshrader  lanl.gov

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.or

Re: [OMPI users] question about "--rank-by slot" behavior

2016-11-30 Thread r...@open-mpi.org
“slot’ never became equivalent to “socket”, or to “core”. Here is what happened:

*for your first example: the mapper assigns the first process to the first node 
because there is a free core there, and you said to map-by core. It goes on to 
assign the second process to the second core, and the third process to the 
third core, etc. until we reach the defined #procs for that node (i.e., the 
number of assigned “slots” for that node). When it goes to rank the procs, the 
ranker starts with the first process assigned on the first node - this process 
occupies the first “slot”, and so it gets rank 0. The ranker then assigns rank 
1 to the second process it assigned to the first node, as that process occupies 
the second “slot”. Etc.

* your 2nd example: the mapper assigns the first process to the first socket of 
the first node, the second process to the second socket of the first node, and 
the third process to the first socket of the first node, until all the “slots” 
for that node have been filled. The ranker then starts with the first process 
that was assigned to the first node, and gives it rank 0. The ranker then 
assigns rank 1 to the second process that was assigned to the node - that would 
be the first proc mapped to the second socket. The ranker then assigns rank 2 
to the third proc assigned to the node - that would be the 2nd proc assigned to 
the first socket.

* your 3rd example: the mapper assigns the first process to the first socket of 
the first node, the second process to the second socket of the first node, and 
the third process to the first socket of the second node, continuing around 
until all procs have been mapped. The ranker then starts with the first proc 
assigned to the first node, and gives it rank 0. The ranker then assigns rank 1 
to the second process assigned to the first node (because we are ranking by 
slot!), which corresponds to the first proc mapped to the second socket. The 
ranker then assigns rank 2 to the third process assigned to the first node, 
which corresponds to the second proc mapped to the first socket of that node.

So you can see that you will indeed get the same relative ranking, even though 
the mapping was done using a different algorithm.

HTH
Ralph

> On Nov 30, 2016, at 2:16 PM, David Shrader  wrote:
> 
> Hello Ralph,
> 
> I do understand that "slot" is an abstract term and isn't tied down to any 
> particular piece of hardware. What I am trying to understand is how "slot" 
> came to be equivalent to "socket" in my second and third example, but "core" 
> in my first example. As far as I can tell, MPI ranks should have been 
> assigned the same in all three examples. Why weren't they?
> 
> You mentioned that, when using "--rank-by slot", the ranks are assigned 
> round-robin by scheduler entry; does this mean that the scheduler entries 
> change based on the mapping algorithm (the only thing I changed in my 
> examples) and this results in ranks being assigned differently?
> 
> Thanks again,
> David
> 
> On 11/30/2016 01:23 PM, r...@open-mpi.org wrote:
>> I think you have confused “slot” with a physical “core”. The two have 
>> absolutely nothing to do with each other.
>> 
>> A “slot” is nothing more than a scheduling entry in which a process can be 
>> placed. So when you --rank-by slot, the ranks are assigned round-robin by 
>> scheduler entry - i.e., you assign all the ranks on the first node, then 
>> assign all the ranks on the next node, etc.
>> 
>> It doesn’t matter where those ranks are placed, or what core or socket they 
>> are running on. We just blindly go thru and assign numbers.
>> 
>> If you rank-by core, then we cycle across the procs by looking at the core 
>> number they are bound to, assigning all the procs on a node before moving to 
>> the next node. If you rank-by socket, then you cycle across the procs on a 
>> node by round-robin of sockets, assigning all procs on the node before 
>> moving to the next node. If you then added “span” to that directive, we’d 
>> round-robin by socket across all nodes before circling around to the next 
>> proc on this node.
>> 
>> HTH
>> Ralph
>> 
>> 
>>> On Nov 30, 2016, at 11:26 AM, David Shrader  wrote:
>>> 
>>> Hello All,
>>> 
>>> The man page for mpirun says that the default ranking procedure is 
>>> round-robin by slot. It doesn't seem to be that straight-forward to me, 
>>> though, and I wanted to ask about the behavior.
>>> 
>>> To help illustrate my confusion, here are a few examples where the ranking 
>>> behavior changed based on the mapping behavior, which doesn't make sense to 
>>> me, yet. First, here is a simple map by core (using 4 nodes of 32 cpu cores 
>>> each):
>>> 
>>> $> mpirun -n 128 --map-by core --report-bindings true
>>> [gr0649.localdomain:119614] MCW rank 0 bound to socket 0[core 0[hwt 0]]: 
>>> [B/././././././././././././././././.][./././././././././././././././././.]
>>> [gr0649.localdomain:119614] MCW rank 1 bound to socket 0[core 1[hwt 0]]: 
>>> [./B/./././././.

Re: [OMPI users] question about "--rank-by slot" behavior

2016-11-30 Thread David Shrader
Thank you for the explanation! I understand what is going on now: there 
is a process list for each node whose order is dependent on the mapping 
policy, and the ranker, when using "slot," walks through that list. 
Makes sense.


Thank you again!
David

On 11/30/2016 04:46 PM, r...@open-mpi.org wrote:

“slot’ never became equivalent to “socket”, or to “core”. Here is what happened:

*for your first example: the mapper assigns the first process to the first node 
because there is a free core there, and you said to map-by core. It goes on to 
assign the second process to the second core, and the third process to the 
third core, etc. until we reach the defined #procs for that node (i.e., the 
number of assigned “slots” for that node). When it goes to rank the procs, the 
ranker starts with the first process assigned on the first node - this process 
occupies the first “slot”, and so it gets rank 0. The ranker then assigns rank 
1 to the second process it assigned to the first node, as that process occupies 
the second “slot”. Etc.

* your 2nd example: the mapper assigns the first process to the first socket of 
the first node, the second process to the second socket of the first node, and 
the third process to the first socket of the first node, until all the “slots” 
for that node have been filled. The ranker then starts with the first process 
that was assigned to the first node, and gives it rank 0. The ranker then 
assigns rank 1 to the second process that was assigned to the node - that would 
be the first proc mapped to the second socket. The ranker then assigns rank 2 
to the third proc assigned to the node - that would be the 2nd proc assigned to 
the first socket.

* your 3rd example: the mapper assigns the first process to the first socket of 
the first node, the second process to the second socket of the first node, and 
the third process to the first socket of the second node, continuing around 
until all procs have been mapped. The ranker then starts with the first proc 
assigned to the first node, and gives it rank 0. The ranker then assigns rank 1 
to the second process assigned to the first node (because we are ranking by 
slot!), which corresponds to the first proc mapped to the second socket. The 
ranker then assigns rank 2 to the third process assigned to the first node, 
which corresponds to the second proc mapped to the first socket of that node.

So you can see that you will indeed get the same relative ranking, even though 
the mapping was done using a different algorithm.

HTH
Ralph


On Nov 30, 2016, at 2:16 PM, David Shrader  wrote:

Hello Ralph,

I do understand that "slot" is an abstract term and isn't tied down to any particular piece of hardware. What 
I am trying to understand is how "slot" came to be equivalent to "socket" in my second and third 
example, but "core" in my first example. As far as I can tell, MPI ranks should have been assigned the same 
in all three examples. Why weren't they?

You mentioned that, when using "--rank-by slot", the ranks are assigned 
round-robin by scheduler entry; does this mean that the scheduler entries change based on 
the mapping algorithm (the only thing I changed in my examples) and this results in ranks 
being assigned differently?

Thanks again,
David

On 11/30/2016 01:23 PM, r...@open-mpi.org wrote:

I think you have confused “slot” with a physical “core”. The two have 
absolutely nothing to do with each other.

A “slot” is nothing more than a scheduling entry in which a process can be 
placed. So when you --rank-by slot, the ranks are assigned round-robin by 
scheduler entry - i.e., you assign all the ranks on the first node, then assign 
all the ranks on the next node, etc.

It doesn’t matter where those ranks are placed, or what core or socket they are 
running on. We just blindly go thru and assign numbers.

If you rank-by core, then we cycle across the procs by looking at the core 
number they are bound to, assigning all the procs on a node before moving to 
the next node. If you rank-by socket, then you cycle across the procs on a node 
by round-robin of sockets, assigning all procs on the node before moving to the 
next node. If you then added “span” to that directive, we’d round-robin by 
socket across all nodes before circling around to the next proc on this node.

HTH
Ralph



On Nov 30, 2016, at 11:26 AM, David Shrader  wrote:

Hello All,

The man page for mpirun says that the default ranking procedure is round-robin 
by slot. It doesn't seem to be that straight-forward to me, though, and I 
wanted to ask about the behavior.

To help illustrate my confusion, here are a few examples where the ranking 
behavior changed based on the mapping behavior, which doesn't make sense to me, 
yet. First, here is a simple map by core (using 4 nodes of 32 cpu cores each):

$> mpirun -n 128 --map-by core --report-bindings true
[gr0649.localdomain:119614] MCW rank 0 bound to socket 0[core 0[hwt 0]]: 
[B/././././././././././././././././.][./