Re: [OMPI users] More confusion about --map-by!

r...@open-mpi.org Thu, 23 Feb 2017 07:21:37 -0800

Just as a fun follow-up: if you wanted to load-balance across nodes as well as 
within nodes, then you would add the “span” modifier to map-by:


$ mpirun --map-by socket:span,pe=2 --rank-by core --report-bindings -n 8 
hostname
[rhc001:162391] SETTING BINDING TO CORE
[rhc001:162391] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
1[hwt 0-1]]: 
[BB/BB/../../../../../../../../../..][../../../../../../../../../../../..]
[rhc001:162391] MCW rank 1 bound to socket 0[core 2[hwt 0-1]], socket 0[core 
3[hwt 0-1]]: 
[../../BB/BB/../../../../../../../..][../../../../../../../../../../../..]
[rhc001:162391] MCW rank 2 bound to socket 1[core 12[hwt 0-1]], socket 1[core 
13[hwt 0-1]]: 
[../../../../../../../../../../../..][BB/BB/../../../../../../../../../..]
[rhc001:162391] MCW rank 3 bound to socket 1[core 14[hwt 0-1]], socket 1[core 
15[hwt 0-1]]: 
[../../../../../../../../../../../..][../../BB/BB/../../../../../../../..]


[rhc002.cluster:150295] MCW rank 4 bound to socket 0[core 0[hwt 0-1]], socket 
0[core 1[hwt 0-1]]: 
[BB/BB/../../../../../../../../../..][../../../../../../../../../../../..]
[rhc002.cluster:150295] MCW rank 5 bound to socket 0[core 2[hwt 0-1]], socket 
0[core 3[hwt 0-1]]: 
[../../BB/BB/../../../../../../../..][../../../../../../../../../../../..]
[rhc002.cluster:150295] MCW rank 6 bound to socket 1[core 12[hwt 0-1]], socket 
1[core 13[hwt 0-1]]: 
[../../../../../../../../../../../..][BB/BB/../../../../../../../../../..]
[rhc002.cluster:150295] MCW rank 7 bound to socket 1[core 14[hwt 0-1]], socket 
1[core 15[hwt 0-1]]: 
[../../../../../../../../../../../..][../../BB/BB/../../../../../../../..]

“span” causes ORTE to treat all the sockets etc. as being on a single giant 
node.

HTH
Ralph


> On Feb 23, 2017, at 6:38 AM, r...@open-mpi.org wrote:
> 
> From the mpirun man page:
> 
> ******************
> Open MPI employs a three-phase procedure for assigning process locations and 
> ranks:
> mapping
> Assigns a default location to each process
> ranking
> Assigns an MPI_COMM_WORLD rank value to each process
> binding
> Constrains each process to run on specific processors
> The mapping step is used to assign a default location to each process based 
> on the mapper being employed. Mapping by slot, node, and sequentially results 
> in the assignment of the processes to the node level. In contrast, mapping by 
> object, allows the mapper to assign the process to an actual object on each 
> node.
> 
> Note: the location assigned to the process is independent of where it will be 
> bound - the assignment is used solely as input to the binding algorithm.
> 
> The mapping of process processes to nodes can be defined not just with 
> general policies but also, if necessary, using arbitrary mappings that cannot 
> be described by a simple policy. One can use the "sequential mapper," which 
> reads the hostfile line by line, assigning processes to nodes in whatever 
> order the hostfile specifies. Use the -mca rmaps seq option. For example, 
> using the same hostfile as before:
> 
> mpirun -hostfile myhostfile -mca rmaps seq ./a.out
> 
> will launch three processes, one on each of nodes aa, bb, and cc, 
> respectively. The slot counts don’t matter; one process is launched per line 
> on whatever node is listed on the line.
> 
> Another way to specify arbitrary mappings is with a rankfile, which gives you 
> detailed control over process binding as well. Rankfiles are discussed below.
> 
> The second phase focuses on the ranking of the process within the job’s 
> MPI_COMM_WORLD. Open MPI separates this from the mapping procedure to allow 
> more flexibility in the relative placement of MPI processes. 
> 
> The binding phase actually binds each process to a given set of processors. 
> This can improve performance if the operating system is placing processes 
> suboptimally. For example, it might oversubscribe some multi-core processor 
> sockets, leaving other sockets idle; this can lead processes to contend 
> unnecessarily for common resources. Or, it might spread processes out too 
> widely; this can be suboptimal if application performance is sensitive to 
> interprocess communication costs. Binding can also keep the operating system 
> from migrating processes excessively, regardless of how optimally those 
> processes were placed to begin with.
> ********************
> 
> So what you probably want is:  --map-by socket:pe=N --rank-by core
> 
> Remember, the pe=N modifier automatically forces binding at the cpu level. 
> The rank-by directive defaults to rank-by socket when you map-by socket, 
> hence you need to specify that you want it to map by core instead. Here is 
> the result of doing that on my box:
> 
> $ mpirun --map-by socket:pe=2 --rank-by core --report-bindings -n 8 hostname
> [rhc001:154283] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
> 1[hwt 0-1]]: 
> [BB/BB/../../../../../../../../../..][../../../../../../../../../../../..]
> [rhc001:154283] MCW rank 1 bound to socket 0[core 2[hwt 0-1]], socket 0[core 
> 3[hwt 0-1]]: 
> [../../BB/BB/../../../../../../../..][../../../../../../../../../../../..]
> [rhc001:154283] MCW rank 2 bound to socket 0[core 4[hwt 0-1]], socket 0[core 
> 5[hwt 0-1]]: 
> [../../../../BB/BB/../../../../../..][../../../../../../../../../../../..]
> [rhc001:154283] MCW rank 3 bound to socket 0[core 6[hwt 0-1]], socket 0[core 
> 7[hwt 0-1]]: 
> [../../../../../../BB/BB/../../../..][../../../../../../../../../../../..]
> [rhc001:154283] MCW rank 4 bound to socket 1[core 12[hwt 0-1]], socket 1[core 
> 13[hwt 0-1]]: 
> [../../../../../../../../../../../..][BB/BB/../../../../../../../../../..]
> [rhc001:154283] MCW rank 5 bound to socket 1[core 14[hwt 0-1]], socket 1[core 
> 15[hwt 0-1]]: 
> [../../../../../../../../../../../..][../../BB/BB/../../../../../../../..]
> [rhc001:154283] MCW rank 6 bound to socket 1[core 16[hwt 0-1]], socket 1[core 
> 17[hwt 0-1]]: 
> [../../../../../../../../../../../..][../../../../BB/BB/../../../../../..]
> [rhc001:154283] MCW rank 7 bound to socket 1[core 18[hwt 0-1]], socket 1[core 
> 19[hwt 0-1]]: 
> [../../../../../../../../../../../..][../../../../../../BB/BB/../../../..]
> 
> 
> HTH
> Ralph
> 
>> On Feb 23, 2017, at 6:18 AM, <gil...@rist.or.jp <mailto:gil...@rist.or.jp>> 
>> <gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:
>> 
>> Mark,
>> 
>> what about
>> mpirun -np 6 -map-by slot:PE=4 --bind-to core --report-bindings ./prog
>> 
>> it is a fit for 1) and 2) but not 3)
>> 
>> if you use OpenMP and want 2 threads per task, then you can
>> export OMP_NUM_THREADS=2
>> not to use 4 threads by default (with most OpenMP runtimes)
>> 
>> Cheers,
>> 
>> Gilles
>> ----- Original Message -----
>>> Hi,
>>> 
>>> I'm still trying to figure out how to express the core binding I want 
>> to 
>>> openmpi 2.x via the --map-by option. Can anyone help, please?
>>> 
>>> I bet I'm being dumb, but it's proving tricky to achieve the following 
>>> aims (most important first):
>>> 
>>> 1) Maximise memory bandwidth usage (e.g. load balance ranks across
>>>    processor sockets)
>>> 2) Optimise for nearest-neighbour comms (in MPI_COMM_WORLD) (e.g. put
>>>    neighbouring ranks on the same socket)
>>> 3) Have an incantation that's simple to change based on number of 
>> ranks
>>>    and processes per rank I want.
>>> 
>>> Example:
>>> 
>>> Considering a 2 socket, 12 cores/socket box and a program with 2 
>> threads 
>>> per rank...
>>> 
>>> ... this is great if I fully-populate the node:
>>> 
>>> $ mpirun -np 12 -map-by slot:PE=2 --bind-to core --report-bindings ./
>> prog
>>> [somehost:101235] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 
>> 0[core 1[hwt 0]]: [B/B/./././././././././.][./././././././././././.]
>>> [somehost:101235] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket 
>> 0[core 3[hwt 0]]: [././B/B/./././././././.][./././././././././././.]
>>> [somehost:101235] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket 
>> 0[core 5[hwt 0]]: [././././B/B/./././././.][./././././././././././.]
>>> [somehost:101235] MCW rank 3 bound to socket 0[core 6[hwt 0]], socket 
>> 0[core 7[hwt 0]]: [././././././B/B/./././.][./././././././././././.]
>>> [somehost:101235] MCW rank 4 bound to socket 0[core 8[hwt 0]], socket 
>> 0[core 9[hwt 0]]: [././././././././B/B/./.][./././././././././././.]
>>> [somehost:101235] MCW rank 5 bound to socket 0[core 10[hwt 0]], socket 
>> 0[core 11[hwt 0]]: [././././././././././B/B][./././././././././././.]
>>> [somehost:101235] MCW rank 6 bound to socket 1[core 12[hwt 0]], socket 
>> 1[core 13[hwt 0]]: [./././././././././././.][B/B/./././././././././.]
>>> [somehost:101235] MCW rank 7 bound to socket 1[core 14[hwt 0]], socket 
>> 1[core 15[hwt 0]]: [./././././././././././.][././B/B/./././././././.]
>>> [somehost:101235] MCW rank 8 bound to socket 1[core 16[hwt 0]], socket 
>> 1[core 17[hwt 0]]: [./././././././././././.][././././B/B/./././././.]
>>> [somehost:101235] MCW rank 9 bound to socket 1[core 18[hwt 0]], socket 
>> 1[core 19[hwt 0]]: [./././././././././././.][././././././B/B/./././.]
>>> [somehost:101235] MCW rank 10 bound to socket 1[core 20[hwt 0]], 
>> socket 1[core 21[hwt 0]]: [./././././././././././.][././././././././B/B/.
>> /.]
>>> [somehost:101235] MCW rank 11 bound to socket 1[core 22[hwt 0]], 
>> socket 1[core 23[hwt 0]]: [./././././././././././.][././././././././././
>> B/B]
>>> 
>>> 
>>> ... but not if I don't [fails aim (1)]:
>>> 
>>> $ mpirun -np 6 -map-by slot:PE=2 --bind-to core --report-bindings ./
>> prog
>>> [somehost:102035] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 
>> 0[core 1[hwt 0]]: [B/B/./././././././././.][./././././././././././.]
>>> [somehost:102035] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket 
>> 0[core 3[hwt 0]]: [././B/B/./././././././.][./././././././././././.]
>>> [somehost:102035] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket 
>> 0[core 5[hwt 0]]: [././././B/B/./././././.][./././././././././././.]
>>> [somehost:102035] MCW rank 3 bound to socket 0[core 6[hwt 0]], socket 
>> 0[core 7[hwt 0]]: [././././././B/B/./././.][./././././././././././.]
>>> [somehost:102035] MCW rank 4 bound to socket 0[core 8[hwt 0]], socket 
>> 0[core 9[hwt 0]]: [././././././././B/B/./.][./././././././././././.]
>>> [somehost:102035] MCW rank 5 bound to socket 0[core 10[hwt 0]], socket 
>> 0[core 11[hwt 0]]: [././././././././././B/B][./././././././././././.]
>>> 
>>> 
>>> ... whereas if I map by socket instead of slot, I achieve aim (1) but 
>>> fail on aim (2):
>>> 
>>> $ mpirun -np 6 -map-by socket:PE=2 --bind-to core --report-bindings ./
>> prog
>>> [somehost:105601] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 
>> 0[core 1[hwt 0]]: [B/B/./././././././././.][./././././././././././.]
>>> [somehost:105601] MCW rank 1 bound to socket 1[core 12[hwt 0]], socket 
>> 1[core 13[hwt 0]]: [./././././././././././.][B/B/./././././././././.]
>>> [somehost:105601] MCW rank 2 bound to socket 0[core 2[hwt 0]], socket 
>> 0[core 3[hwt 0]]: [././B/B/./././././././.][./././././././././././.]
>>> [somehost:105601] MCW rank 3 bound to socket 1[core 14[hwt 0]], socket 
>> 1[core 15[hwt 0]]: [./././././././././././.][././B/B/./././././././.]
>>> [somehost:105601] MCW rank 4 bound to socket 0[core 4[hwt 0]], socket 
>> 0[core 5[hwt 0]]: [././././B/B/./././././.][./././././././././././.]
>>> [somehost:105601] MCW rank 5 bound to socket 1[core 16[hwt 0]], socket 
>> 1[core 17[hwt 0]]: [./././././././././././.][././././B/B/./././././.]
>>> 
>>> 
>>> Any ideas, please?
>>> 
>>> Thanks,
>>> 
>>> Mark
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] More confusion about --map-by!

Reply via email to