One thing that might help is the *--rank-by *argument that allows you to specify how ranks are assigned separate from mapping/binding (by default we follow the mapping pattern).
For example - adding *--rank-by* to your last example: $ mpirun -np 6 -map-by socket:PE=2 --bind-to core *--rank-by core* --report-bindings ./prog [somehost:105601] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B/./././././././././.][./././././././././././.] [somehost:105601] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]]: [././B/B/./././././././.][./././././././././././.] [somehost:105601] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [././././B/B/./././././.][./././././././././././.] [somehost:105601] MCW rank 3 bound to socket 1[core 12[hwt 0]], socket 1[core 13[hwt 0]]: [./././././././././././.][B/B/./././././././././.] [somehost:105601] MCW rank 4 bound to socket 1[core 14[hwt 0]], socket 1[core 15[hwt 0]]: [./././././././././././.][././B/B/./././././././.] [somehost:105601] MCW rank 5 bound to socket 1[core 16[hwt 0]], socket 1[core 17[hwt 0]]: [./././././././././././.][././././B/B/./././././.] Is that what you are looking for? On Thu, Feb 23, 2017 at 8:18 AM, <gil...@rist.or.jp> wrote: > Mark, > > what about > mpirun -np 6 -map-by slot:PE=4 --bind-to core --report-bindings ./prog > > it is a fit for 1) and 2) but not 3) > > if you use OpenMP and want 2 threads per task, then you can > export OMP_NUM_THREADS=2 > not to use 4 threads by default (with most OpenMP runtimes) > > Cheers, > > Gilles > ----- Original Message ----- > > Hi, > > > > I'm still trying to figure out how to express the core binding I want > to > > openmpi 2.x via the --map-by option. Can anyone help, please? > > > > I bet I'm being dumb, but it's proving tricky to achieve the following > > aims (most important first): > > > > 1) Maximise memory bandwidth usage (e.g. load balance ranks across > > processor sockets) > > 2) Optimise for nearest-neighbour comms (in MPI_COMM_WORLD) (e.g. put > > neighbouring ranks on the same socket) > > 3) Have an incantation that's simple to change based on number of > ranks > > and processes per rank I want. > > > > Example: > > > > Considering a 2 socket, 12 cores/socket box and a program with 2 > threads > > per rank... > > > > ... this is great if I fully-populate the node: > > > > $ mpirun -np 12 -map-by slot:PE=2 --bind-to core --report-bindings ./ > prog > > [somehost:101235] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]]: [B/B/./././././././././.][./././././././././././.] > > [somehost:101235] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket > 0[core 3[hwt 0]]: [././B/B/./././././././.][./././././././././././.] > > [somehost:101235] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket > 0[core 5[hwt 0]]: [././././B/B/./././././.][./././././././././././.] > > [somehost:101235] MCW rank 3 bound to socket 0[core 6[hwt 0]], socket > 0[core 7[hwt 0]]: [././././././B/B/./././.][./././././././././././.] > > [somehost:101235] MCW rank 4 bound to socket 0[core 8[hwt 0]], socket > 0[core 9[hwt 0]]: [././././././././B/B/./.][./././././././././././.] > > [somehost:101235] MCW rank 5 bound to socket 0[core 10[hwt 0]], socket > 0[core 11[hwt 0]]: [././././././././././B/B][./././././././././././.] > > [somehost:101235] MCW rank 6 bound to socket 1[core 12[hwt 0]], socket > 1[core 13[hwt 0]]: [./././././././././././.][B/B/./././././././././.] > > [somehost:101235] MCW rank 7 bound to socket 1[core 14[hwt 0]], socket > 1[core 15[hwt 0]]: [./././././././././././.][././B/B/./././././././.] > > [somehost:101235] MCW rank 8 bound to socket 1[core 16[hwt 0]], socket > 1[core 17[hwt 0]]: [./././././././././././.][././././B/B/./././././.] > > [somehost:101235] MCW rank 9 bound to socket 1[core 18[hwt 0]], socket > 1[core 19[hwt 0]]: [./././././././././././.][././././././B/B/./././.] > > [somehost:101235] MCW rank 10 bound to socket 1[core 20[hwt 0]], > socket 1[core 21[hwt 0]]: [./././././././././././.][././././././././B/B/. > /.] > > [somehost:101235] MCW rank 11 bound to socket 1[core 22[hwt 0]], > socket 1[core 23[hwt 0]]: [./././././././././././.][././././././././././ > B/B] > > > > > > ... but not if I don't [fails aim (1)]: > > > > $ mpirun -np 6 -map-by slot:PE=2 --bind-to core --report-bindings ./ > prog > > [somehost:102035] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]]: [B/B/./././././././././.][./././././././././././.] > > [somehost:102035] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket > 0[core 3[hwt 0]]: [././B/B/./././././././.][./././././././././././.] > > [somehost:102035] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket > 0[core 5[hwt 0]]: [././././B/B/./././././.][./././././././././././.] > > [somehost:102035] MCW rank 3 bound to socket 0[core 6[hwt 0]], socket > 0[core 7[hwt 0]]: [././././././B/B/./././.][./././././././././././.] > > [somehost:102035] MCW rank 4 bound to socket 0[core 8[hwt 0]], socket > 0[core 9[hwt 0]]: [././././././././B/B/./.][./././././././././././.] > > [somehost:102035] MCW rank 5 bound to socket 0[core 10[hwt 0]], socket > 0[core 11[hwt 0]]: [././././././././././B/B][./././././././././././.] > > > > > > ... whereas if I map by socket instead of slot, I achieve aim (1) but > > fail on aim (2): > > > > $ mpirun -np 6 -map-by socket:PE=2 --bind-to core --report-bindings ./ > prog > > [somehost:105601] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]]: [B/B/./././././././././.][./././././././././././.] > > [somehost:105601] MCW rank 1 bound to socket 1[core 12[hwt 0]], socket > 1[core 13[hwt 0]]: [./././././././././././.][B/B/./././././././././.] > > [somehost:105601] MCW rank 2 bound to socket 0[core 2[hwt 0]], socket > 0[core 3[hwt 0]]: [././B/B/./././././././.][./././././././././././.] > > [somehost:105601] MCW rank 3 bound to socket 1[core 14[hwt 0]], socket > 1[core 15[hwt 0]]: [./././././././././././.][././B/B/./././././././.] > > [somehost:105601] MCW rank 4 bound to socket 0[core 4[hwt 0]], socket > 0[core 5[hwt 0]]: [././././B/B/./././././.][./././././././././././.] > > [somehost:105601] MCW rank 5 bound to socket 1[core 16[hwt 0]], socket > 1[core 17[hwt 0]]: [./././././././././././.][././././B/B/./././././.] > > > > > > Any ideas, please? > > > > Thanks, > > > > Mark > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > -- Josh Hursey IBM Spectrum MPI Developer
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users