Re: [OMPI users] More confusion about --map-by!

r...@open-mpi.org Thu, 23 Feb 2017 06:41:07 -0800

From the mpirun man page:

******************
Open MPI employs a three-phase procedure for assigning process locations and 
ranks:
mapping
Assigns a default location to each process
ranking
Assigns an MPI_COMM_WORLD rank value to each process
binding
Constrains each process to run on specific processors
The mapping step is used to assign a default location to each process based on 
the mapper being employed. Mapping by slot, node, and sequentially results in 
the assignment of the processes to the node level. In contrast, mapping by 
object, allows the mapper to assign the process to an actual object on each 
node.


Note: the location assigned to the process is independent of where it will be 
bound - the assignment is used solely as input to the binding algorithm.

The mapping of process processes to nodes can be defined not just with general 
policies but also, if necessary, using arbitrary mappings that cannot be 
described by a simple policy. One can use the "sequential mapper," which reads 
the hostfile line by line, assigning processes to nodes in whatever order the 
hostfile specifies. Use the -mca rmaps seq option. For example, using the same 
hostfile as before:

mpirun -hostfile myhostfile -mca rmaps seq ./a.out

will launch three processes, one on each of nodes aa, bb, and cc, respectively. 
The slot counts don’t matter; one process is launched per line on whatever node 
is listed on the line.

Another way to specify arbitrary mappings is with a rankfile, which gives you 
detailed control over process binding as well. Rankfiles are discussed below.

The second phase focuses on the ranking of the process within the job’s 
MPI_COMM_WORLD. Open MPI separates this from the mapping procedure to allow 
more flexibility in the relative placement of MPI processes. 

The binding phase actually binds each process to a given set of processors. 
This can improve performance if the operating system is placing processes 
suboptimally. For example, it might oversubscribe some multi-core processor 
sockets, leaving other sockets idle; this can lead processes to contend 
unnecessarily for common resources. Or, it might spread processes out too 
widely; this can be suboptimal if application performance is sensitive to 
interprocess communication costs. Binding can also keep the operating system 
from migrating processes excessively, regardless of how optimally those 
processes were placed to begin with.
********************

So what you probably want is:  --map-by socket:pe=N --rank-by core

Remember, the pe=N modifier automatically forces binding at the cpu level. The 
rank-by directive defaults to rank-by socket when you map-by socket, hence you 
need to specify that you want it to map by core instead. Here is the result of 
doing that on my box:

$ mpirun --map-by socket:pe=2 --rank-by core --report-bindings -n 8 hostname
[rhc001:154283] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
1[hwt 0-1]]: 
[BB/BB/../../../../../../../../../..][../../../../../../../../../../../..]
[rhc001:154283] MCW rank 1 bound to socket 0[core 2[hwt 0-1]], socket 0[core 
3[hwt 0-1]]: 
[../../BB/BB/../../../../../../../..][../../../../../../../../../../../..]
[rhc001:154283] MCW rank 2 bound to socket 0[core 4[hwt 0-1]], socket 0[core 
5[hwt 0-1]]: 
[../../../../BB/BB/../../../../../..][../../../../../../../../../../../..]
[rhc001:154283] MCW rank 3 bound to socket 0[core 6[hwt 0-1]], socket 0[core 
7[hwt 0-1]]: 
[../../../../../../BB/BB/../../../..][../../../../../../../../../../../..]
[rhc001:154283] MCW rank 4 bound to socket 1[core 12[hwt 0-1]], socket 1[core 
13[hwt 0-1]]: 
[../../../../../../../../../../../..][BB/BB/../../../../../../../../../..]
[rhc001:154283] MCW rank 5 bound to socket 1[core 14[hwt 0-1]], socket 1[core 
15[hwt 0-1]]: 
[../../../../../../../../../../../..][../../BB/BB/../../../../../../../..]
[rhc001:154283] MCW rank 6 bound to socket 1[core 16[hwt 0-1]], socket 1[core 
17[hwt 0-1]]: 
[../../../../../../../../../../../..][../../../../BB/BB/../../../../../..]
[rhc001:154283] MCW rank 7 bound to socket 1[core 18[hwt 0-1]], socket 1[core 
19[hwt 0-1]]: 
[../../../../../../../../../../../..][../../../../../../BB/BB/../../../..]


HTH
Ralph

> On Feb 23, 2017, at 6:18 AM, <gil...@rist.or.jp> <gil...@rist.or.jp> wrote:
> 
> Mark,
> 
> what about
> mpirun -np 6 -map-by slot:PE=4 --bind-to core --report-bindings ./prog
> 
> it is a fit for 1) and 2) but not 3)
> 
> if you use OpenMP and want 2 threads per task, then you can
> export OMP_NUM_THREADS=2
> not to use 4 threads by default (with most OpenMP runtimes)
> 
> Cheers,
> 
> Gilles
> ----- Original Message -----
>> Hi,
>> 
>> I'm still trying to figure out how to express the core binding I want 
> to 
>> openmpi 2.x via the --map-by option. Can anyone help, please?
>> 
>> I bet I'm being dumb, but it's proving tricky to achieve the following 
>> aims (most important first):
>> 
>> 1) Maximise memory bandwidth usage (e.g. load balance ranks across
>>    processor sockets)
>> 2) Optimise for nearest-neighbour comms (in MPI_COMM_WORLD) (e.g. put
>>    neighbouring ranks on the same socket)
>> 3) Have an incantation that's simple to change based on number of 
> ranks
>>    and processes per rank I want.
>> 
>> Example:
>> 
>> Considering a 2 socket, 12 cores/socket box and a program with 2 
> threads 
>> per rank...
>> 
>> ... this is great if I fully-populate the node:
>> 
>> $ mpirun -np 12 -map-by slot:PE=2 --bind-to core --report-bindings ./
> prog
>> [somehost:101235] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 
> 0[core 1[hwt 0]]: [B/B/./././././././././.][./././././././././././.]
>> [somehost:101235] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket 
> 0[core 3[hwt 0]]: [././B/B/./././././././.][./././././././././././.]
>> [somehost:101235] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket 
> 0[core 5[hwt 0]]: [././././B/B/./././././.][./././././././././././.]
>> [somehost:101235] MCW rank 3 bound to socket 0[core 6[hwt 0]], socket 
> 0[core 7[hwt 0]]: [././././././B/B/./././.][./././././././././././.]
>> [somehost:101235] MCW rank 4 bound to socket 0[core 8[hwt 0]], socket 
> 0[core 9[hwt 0]]: [././././././././B/B/./.][./././././././././././.]
>> [somehost:101235] MCW rank 5 bound to socket 0[core 10[hwt 0]], socket 
> 0[core 11[hwt 0]]: [././././././././././B/B][./././././././././././.]
>> [somehost:101235] MCW rank 6 bound to socket 1[core 12[hwt 0]], socket 
> 1[core 13[hwt 0]]: [./././././././././././.][B/B/./././././././././.]
>> [somehost:101235] MCW rank 7 bound to socket 1[core 14[hwt 0]], socket 
> 1[core 15[hwt 0]]: [./././././././././././.][././B/B/./././././././.]
>> [somehost:101235] MCW rank 8 bound to socket 1[core 16[hwt 0]], socket 
> 1[core 17[hwt 0]]: [./././././././././././.][././././B/B/./././././.]
>> [somehost:101235] MCW rank 9 bound to socket 1[core 18[hwt 0]], socket 
> 1[core 19[hwt 0]]: [./././././././././././.][././././././B/B/./././.]
>> [somehost:101235] MCW rank 10 bound to socket 1[core 20[hwt 0]], 
> socket 1[core 21[hwt 0]]: [./././././././././././.][././././././././B/B/.
> /.]
>> [somehost:101235] MCW rank 11 bound to socket 1[core 22[hwt 0]], 
> socket 1[core 23[hwt 0]]: [./././././././././././.][././././././././././
> B/B]
>> 
>> 
>> ... but not if I don't [fails aim (1)]:
>> 
>> $ mpirun -np 6 -map-by slot:PE=2 --bind-to core --report-bindings ./
> prog
>> [somehost:102035] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 
> 0[core 1[hwt 0]]: [B/B/./././././././././.][./././././././././././.]
>> [somehost:102035] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket 
> 0[core 3[hwt 0]]: [././B/B/./././././././.][./././././././././././.]
>> [somehost:102035] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket 
> 0[core 5[hwt 0]]: [././././B/B/./././././.][./././././././././././.]
>> [somehost:102035] MCW rank 3 bound to socket 0[core 6[hwt 0]], socket 
> 0[core 7[hwt 0]]: [././././././B/B/./././.][./././././././././././.]
>> [somehost:102035] MCW rank 4 bound to socket 0[core 8[hwt 0]], socket 
> 0[core 9[hwt 0]]: [././././././././B/B/./.][./././././././././././.]
>> [somehost:102035] MCW rank 5 bound to socket 0[core 10[hwt 0]], socket 
> 0[core 11[hwt 0]]: [././././././././././B/B][./././././././././././.]
>> 
>> 
>> ... whereas if I map by socket instead of slot, I achieve aim (1) but 
>> fail on aim (2):
>> 
>> $ mpirun -np 6 -map-by socket:PE=2 --bind-to core --report-bindings ./
> prog
>> [somehost:105601] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 
> 0[core 1[hwt 0]]: [B/B/./././././././././.][./././././././././././.]
>> [somehost:105601] MCW rank 1 bound to socket 1[core 12[hwt 0]], socket 
> 1[core 13[hwt 0]]: [./././././././././././.][B/B/./././././././././.]
>> [somehost:105601] MCW rank 2 bound to socket 0[core 2[hwt 0]], socket 
> 0[core 3[hwt 0]]: [././B/B/./././././././.][./././././././././././.]
>> [somehost:105601] MCW rank 3 bound to socket 1[core 14[hwt 0]], socket 
> 1[core 15[hwt 0]]: [./././././././././././.][././B/B/./././././././.]
>> [somehost:105601] MCW rank 4 bound to socket 0[core 4[hwt 0]], socket 
> 0[core 5[hwt 0]]: [././././B/B/./././././.][./././././././././././.]
>> [somehost:105601] MCW rank 5 bound to socket 1[core 16[hwt 0]], socket 
> 1[core 17[hwt 0]]: [./././././././././././.][././././B/B/./././././.]
>> 
>> 
>> Any ideas, please?
>> 
>> Thanks,
>> 
>> Mark
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> 
> 
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] More confusion about --map-by!

Reply via email to