Just as a fun follow-up: if you wanted to load-balance across nodes as well as within nodes, then you would add the “span” modifier to map-by:
$ mpirun --map-by socket:span,pe=2 --rank-by core --report-bindings -n 8 hostname [rhc001:162391] SETTING BINDING TO CORE [rhc001:162391] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]]: [BB/BB/../../../../../../../../../..][../../../../../../../../../../../..] [rhc001:162391] MCW rank 1 bound to socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]]: [../../BB/BB/../../../../../../../..][../../../../../../../../../../../..] [rhc001:162391] MCW rank 2 bound to socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]]: [../../../../../../../../../../../..][BB/BB/../../../../../../../../../..] [rhc001:162391] MCW rank 3 bound to socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../../../../../..][../../BB/BB/../../../../../../../..] [rhc002.cluster:150295] MCW rank 4 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]]: [BB/BB/../../../../../../../../../..][../../../../../../../../../../../..] [rhc002.cluster:150295] MCW rank 5 bound to socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]]: [../../BB/BB/../../../../../../../..][../../../../../../../../../../../..] [rhc002.cluster:150295] MCW rank 6 bound to socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]]: [../../../../../../../../../../../..][BB/BB/../../../../../../../../../..] [rhc002.cluster:150295] MCW rank 7 bound to socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../../../../../..][../../BB/BB/../../../../../../../..] “span” causes ORTE to treat all the sockets etc. as being on a single giant node. HTH Ralph > On Feb 23, 2017, at 6:38 AM, r...@open-mpi.org wrote: > > From the mpirun man page: > > ****************** > Open MPI employs a three-phase procedure for assigning process locations and > ranks: > mapping > Assigns a default location to each process > ranking > Assigns an MPI_COMM_WORLD rank value to each process > binding > Constrains each process to run on specific processors > The mapping step is used to assign a default location to each process based > on the mapper being employed. Mapping by slot, node, and sequentially results > in the assignment of the processes to the node level. In contrast, mapping by > object, allows the mapper to assign the process to an actual object on each > node. > > Note: the location assigned to the process is independent of where it will be > bound - the assignment is used solely as input to the binding algorithm. > > The mapping of process processes to nodes can be defined not just with > general policies but also, if necessary, using arbitrary mappings that cannot > be described by a simple policy. One can use the "sequential mapper," which > reads the hostfile line by line, assigning processes to nodes in whatever > order the hostfile specifies. Use the -mca rmaps seq option. For example, > using the same hostfile as before: > > mpirun -hostfile myhostfile -mca rmaps seq ./a.out > > will launch three processes, one on each of nodes aa, bb, and cc, > respectively. The slot counts don’t matter; one process is launched per line > on whatever node is listed on the line. > > Another way to specify arbitrary mappings is with a rankfile, which gives you > detailed control over process binding as well. Rankfiles are discussed below. > > The second phase focuses on the ranking of the process within the job’s > MPI_COMM_WORLD. Open MPI separates this from the mapping procedure to allow > more flexibility in the relative placement of MPI processes. > > The binding phase actually binds each process to a given set of processors. > This can improve performance if the operating system is placing processes > suboptimally. For example, it might oversubscribe some multi-core processor > sockets, leaving other sockets idle; this can lead processes to contend > unnecessarily for common resources. Or, it might spread processes out too > widely; this can be suboptimal if application performance is sensitive to > interprocess communication costs. Binding can also keep the operating system > from migrating processes excessively, regardless of how optimally those > processes were placed to begin with. > ******************** > > So what you probably want is: --map-by socket:pe=N --rank-by core > > Remember, the pe=N modifier automatically forces binding at the cpu level. > The rank-by directive defaults to rank-by socket when you map-by socket, > hence you need to specify that you want it to map by core instead. Here is > the result of doing that on my box: > > $ mpirun --map-by socket:pe=2 --rank-by core --report-bindings -n 8 hostname > [rhc001:154283] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core > 1[hwt 0-1]]: > [BB/BB/../../../../../../../../../..][../../../../../../../../../../../..] > [rhc001:154283] MCW rank 1 bound to socket 0[core 2[hwt 0-1]], socket 0[core > 3[hwt 0-1]]: > [../../BB/BB/../../../../../../../..][../../../../../../../../../../../..] > [rhc001:154283] MCW rank 2 bound to socket 0[core 4[hwt 0-1]], socket 0[core > 5[hwt 0-1]]: > [../../../../BB/BB/../../../../../..][../../../../../../../../../../../..] > [rhc001:154283] MCW rank 3 bound to socket 0[core 6[hwt 0-1]], socket 0[core > 7[hwt 0-1]]: > [../../../../../../BB/BB/../../../..][../../../../../../../../../../../..] > [rhc001:154283] MCW rank 4 bound to socket 1[core 12[hwt 0-1]], socket 1[core > 13[hwt 0-1]]: > [../../../../../../../../../../../..][BB/BB/../../../../../../../../../..] > [rhc001:154283] MCW rank 5 bound to socket 1[core 14[hwt 0-1]], socket 1[core > 15[hwt 0-1]]: > [../../../../../../../../../../../..][../../BB/BB/../../../../../../../..] > [rhc001:154283] MCW rank 6 bound to socket 1[core 16[hwt 0-1]], socket 1[core > 17[hwt 0-1]]: > [../../../../../../../../../../../..][../../../../BB/BB/../../../../../..] > [rhc001:154283] MCW rank 7 bound to socket 1[core 18[hwt 0-1]], socket 1[core > 19[hwt 0-1]]: > [../../../../../../../../../../../..][../../../../../../BB/BB/../../../..] > > > HTH > Ralph > >> On Feb 23, 2017, at 6:18 AM, <gil...@rist.or.jp <mailto:gil...@rist.or.jp>> >> <gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote: >> >> Mark, >> >> what about >> mpirun -np 6 -map-by slot:PE=4 --bind-to core --report-bindings ./prog >> >> it is a fit for 1) and 2) but not 3) >> >> if you use OpenMP and want 2 threads per task, then you can >> export OMP_NUM_THREADS=2 >> not to use 4 threads by default (with most OpenMP runtimes) >> >> Cheers, >> >> Gilles >> ----- Original Message ----- >>> Hi, >>> >>> I'm still trying to figure out how to express the core binding I want >> to >>> openmpi 2.x via the --map-by option. Can anyone help, please? >>> >>> I bet I'm being dumb, but it's proving tricky to achieve the following >>> aims (most important first): >>> >>> 1) Maximise memory bandwidth usage (e.g. load balance ranks across >>> processor sockets) >>> 2) Optimise for nearest-neighbour comms (in MPI_COMM_WORLD) (e.g. put >>> neighbouring ranks on the same socket) >>> 3) Have an incantation that's simple to change based on number of >> ranks >>> and processes per rank I want. >>> >>> Example: >>> >>> Considering a 2 socket, 12 cores/socket box and a program with 2 >> threads >>> per rank... >>> >>> ... this is great if I fully-populate the node: >>> >>> $ mpirun -np 12 -map-by slot:PE=2 --bind-to core --report-bindings ./ >> prog >>> [somehost:101235] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket >> 0[core 1[hwt 0]]: [B/B/./././././././././.][./././././././././././.] >>> [somehost:101235] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket >> 0[core 3[hwt 0]]: [././B/B/./././././././.][./././././././././././.] >>> [somehost:101235] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket >> 0[core 5[hwt 0]]: [././././B/B/./././././.][./././././././././././.] >>> [somehost:101235] MCW rank 3 bound to socket 0[core 6[hwt 0]], socket >> 0[core 7[hwt 0]]: [././././././B/B/./././.][./././././././././././.] >>> [somehost:101235] MCW rank 4 bound to socket 0[core 8[hwt 0]], socket >> 0[core 9[hwt 0]]: [././././././././B/B/./.][./././././././././././.] >>> [somehost:101235] MCW rank 5 bound to socket 0[core 10[hwt 0]], socket >> 0[core 11[hwt 0]]: [././././././././././B/B][./././././././././././.] >>> [somehost:101235] MCW rank 6 bound to socket 1[core 12[hwt 0]], socket >> 1[core 13[hwt 0]]: [./././././././././././.][B/B/./././././././././.] >>> [somehost:101235] MCW rank 7 bound to socket 1[core 14[hwt 0]], socket >> 1[core 15[hwt 0]]: [./././././././././././.][././B/B/./././././././.] >>> [somehost:101235] MCW rank 8 bound to socket 1[core 16[hwt 0]], socket >> 1[core 17[hwt 0]]: [./././././././././././.][././././B/B/./././././.] >>> [somehost:101235] MCW rank 9 bound to socket 1[core 18[hwt 0]], socket >> 1[core 19[hwt 0]]: [./././././././././././.][././././././B/B/./././.] >>> [somehost:101235] MCW rank 10 bound to socket 1[core 20[hwt 0]], >> socket 1[core 21[hwt 0]]: [./././././././././././.][././././././././B/B/. >> /.] >>> [somehost:101235] MCW rank 11 bound to socket 1[core 22[hwt 0]], >> socket 1[core 23[hwt 0]]: [./././././././././././.][././././././././././ >> B/B] >>> >>> >>> ... but not if I don't [fails aim (1)]: >>> >>> $ mpirun -np 6 -map-by slot:PE=2 --bind-to core --report-bindings ./ >> prog >>> [somehost:102035] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket >> 0[core 1[hwt 0]]: [B/B/./././././././././.][./././././././././././.] >>> [somehost:102035] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket >> 0[core 3[hwt 0]]: [././B/B/./././././././.][./././././././././././.] >>> [somehost:102035] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket >> 0[core 5[hwt 0]]: [././././B/B/./././././.][./././././././././././.] >>> [somehost:102035] MCW rank 3 bound to socket 0[core 6[hwt 0]], socket >> 0[core 7[hwt 0]]: [././././././B/B/./././.][./././././././././././.] >>> [somehost:102035] MCW rank 4 bound to socket 0[core 8[hwt 0]], socket >> 0[core 9[hwt 0]]: [././././././././B/B/./.][./././././././././././.] >>> [somehost:102035] MCW rank 5 bound to socket 0[core 10[hwt 0]], socket >> 0[core 11[hwt 0]]: [././././././././././B/B][./././././././././././.] >>> >>> >>> ... whereas if I map by socket instead of slot, I achieve aim (1) but >>> fail on aim (2): >>> >>> $ mpirun -np 6 -map-by socket:PE=2 --bind-to core --report-bindings ./ >> prog >>> [somehost:105601] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket >> 0[core 1[hwt 0]]: [B/B/./././././././././.][./././././././././././.] >>> [somehost:105601] MCW rank 1 bound to socket 1[core 12[hwt 0]], socket >> 1[core 13[hwt 0]]: [./././././././././././.][B/B/./././././././././.] >>> [somehost:105601] MCW rank 2 bound to socket 0[core 2[hwt 0]], socket >> 0[core 3[hwt 0]]: [././B/B/./././././././.][./././././././././././.] >>> [somehost:105601] MCW rank 3 bound to socket 1[core 14[hwt 0]], socket >> 1[core 15[hwt 0]]: [./././././././././././.][././B/B/./././././././.] >>> [somehost:105601] MCW rank 4 bound to socket 0[core 4[hwt 0]], socket >> 0[core 5[hwt 0]]: [././././B/B/./././././.][./././././././././././.] >>> [somehost:105601] MCW rank 5 bound to socket 1[core 16[hwt 0]], socket >> 1[core 17[hwt 0]]: [./././././././././././.][././././B/B/./././././.] >>> >>> >>> Any ideas, please? >>> >>> Thanks, >>> >>> Mark >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users