One thing that might help is the *--rank-by *argument that allows you to
specify how ranks are assigned separate from mapping/binding (by default we
follow the mapping pattern).

For example - adding *--rank-by* to your last example:
$ mpirun -np 6 -map-by socket:PE=2 --bind-to core *--rank-by core*
--report-bindings ./prog
[somehost:105601] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]]: [B/B/./././././././././.][./././././././././././.]
[somehost:105601] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket
0[core 3[hwt 0]]: [././B/B/./././././././.][./././././././././././.]
[somehost:105601] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket
0[core 5[hwt 0]]: [././././B/B/./././././.][./././././././././././.]
[somehost:105601] MCW rank 3 bound to socket 1[core 12[hwt 0]], socket
1[core 13[hwt 0]]: [./././././././././././.][B/B/./././././././././.]
[somehost:105601] MCW rank 4 bound to socket 1[core 14[hwt 0]], socket
1[core 15[hwt 0]]: [./././././././././././.][././B/B/./././././././.]
[somehost:105601] MCW rank 5 bound to socket 1[core 16[hwt 0]], socket
1[core 17[hwt 0]]: [./././././././././././.][././././B/B/./././././.]

Is that what you are looking for?



On Thu, Feb 23, 2017 at 8:18 AM, <gil...@rist.or.jp> wrote:

> Mark,
>
> what about
> mpirun -np 6 -map-by slot:PE=4 --bind-to core --report-bindings ./prog
>
> it is a fit for 1) and 2) but not 3)
>
> if you use OpenMP and want 2 threads per task, then you can
> export OMP_NUM_THREADS=2
> not to use 4 threads by default (with most OpenMP runtimes)
>
> Cheers,
>
> Gilles
> ----- Original Message -----
> > Hi,
> >
> > I'm still trying to figure out how to express the core binding I want
> to
> > openmpi 2.x via the --map-by option. Can anyone help, please?
> >
> > I bet I'm being dumb, but it's proving tricky to achieve the following
> > aims (most important first):
> >
> > 1) Maximise memory bandwidth usage (e.g. load balance ranks across
> >     processor sockets)
> > 2) Optimise for nearest-neighbour comms (in MPI_COMM_WORLD) (e.g. put
> >     neighbouring ranks on the same socket)
> > 3) Have an incantation that's simple to change based on number of
> ranks
> >     and processes per rank I want.
> >
> > Example:
> >
> > Considering a 2 socket, 12 cores/socket box and a program with 2
> threads
> > per rank...
> >
> > ... this is great if I fully-populate the node:
> >
> > $ mpirun -np 12 -map-by slot:PE=2 --bind-to core --report-bindings ./
> prog
> > [somehost:101235] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
> 0[core 1[hwt 0]]: [B/B/./././././././././.][./././././././././././.]
> > [somehost:101235] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket
> 0[core 3[hwt 0]]: [././B/B/./././././././.][./././././././././././.]
> > [somehost:101235] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket
> 0[core 5[hwt 0]]: [././././B/B/./././././.][./././././././././././.]
> > [somehost:101235] MCW rank 3 bound to socket 0[core 6[hwt 0]], socket
> 0[core 7[hwt 0]]: [././././././B/B/./././.][./././././././././././.]
> > [somehost:101235] MCW rank 4 bound to socket 0[core 8[hwt 0]], socket
> 0[core 9[hwt 0]]: [././././././././B/B/./.][./././././././././././.]
> > [somehost:101235] MCW rank 5 bound to socket 0[core 10[hwt 0]], socket
> 0[core 11[hwt 0]]: [././././././././././B/B][./././././././././././.]
> > [somehost:101235] MCW rank 6 bound to socket 1[core 12[hwt 0]], socket
> 1[core 13[hwt 0]]: [./././././././././././.][B/B/./././././././././.]
> > [somehost:101235] MCW rank 7 bound to socket 1[core 14[hwt 0]], socket
> 1[core 15[hwt 0]]: [./././././././././././.][././B/B/./././././././.]
> > [somehost:101235] MCW rank 8 bound to socket 1[core 16[hwt 0]], socket
> 1[core 17[hwt 0]]: [./././././././././././.][././././B/B/./././././.]
> > [somehost:101235] MCW rank 9 bound to socket 1[core 18[hwt 0]], socket
> 1[core 19[hwt 0]]: [./././././././././././.][././././././B/B/./././.]
> > [somehost:101235] MCW rank 10 bound to socket 1[core 20[hwt 0]],
> socket 1[core 21[hwt 0]]: [./././././././././././.][././././././././B/B/.
> /.]
> > [somehost:101235] MCW rank 11 bound to socket 1[core 22[hwt 0]],
> socket 1[core 23[hwt 0]]: [./././././././././././.][././././././././././
> B/B]
> >
> >
> > ... but not if I don't [fails aim (1)]:
> >
> > $ mpirun -np 6 -map-by slot:PE=2 --bind-to core --report-bindings ./
> prog
> > [somehost:102035] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
> 0[core 1[hwt 0]]: [B/B/./././././././././.][./././././././././././.]
> > [somehost:102035] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket
> 0[core 3[hwt 0]]: [././B/B/./././././././.][./././././././././././.]
> > [somehost:102035] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket
> 0[core 5[hwt 0]]: [././././B/B/./././././.][./././././././././././.]
> > [somehost:102035] MCW rank 3 bound to socket 0[core 6[hwt 0]], socket
> 0[core 7[hwt 0]]: [././././././B/B/./././.][./././././././././././.]
> > [somehost:102035] MCW rank 4 bound to socket 0[core 8[hwt 0]], socket
> 0[core 9[hwt 0]]: [././././././././B/B/./.][./././././././././././.]
> > [somehost:102035] MCW rank 5 bound to socket 0[core 10[hwt 0]], socket
> 0[core 11[hwt 0]]: [././././././././././B/B][./././././././././././.]
> >
> >
> > ... whereas if I map by socket instead of slot, I achieve aim (1) but
> > fail on aim (2):
> >
> > $ mpirun -np 6 -map-by socket:PE=2 --bind-to core --report-bindings ./
> prog
> > [somehost:105601] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
> 0[core 1[hwt 0]]: [B/B/./././././././././.][./././././././././././.]
> > [somehost:105601] MCW rank 1 bound to socket 1[core 12[hwt 0]], socket
> 1[core 13[hwt 0]]: [./././././././././././.][B/B/./././././././././.]
> > [somehost:105601] MCW rank 2 bound to socket 0[core 2[hwt 0]], socket
> 0[core 3[hwt 0]]: [././B/B/./././././././.][./././././././././././.]
> > [somehost:105601] MCW rank 3 bound to socket 1[core 14[hwt 0]], socket
> 1[core 15[hwt 0]]: [./././././././././././.][././B/B/./././././././.]
> > [somehost:105601] MCW rank 4 bound to socket 0[core 4[hwt 0]], socket
> 0[core 5[hwt 0]]: [././././B/B/./././././.][./././././././././././.]
> > [somehost:105601] MCW rank 5 bound to socket 1[core 16[hwt 0]], socket
> 1[core 17[hwt 0]]: [./././././././././././.][././././B/B/./././././.]
> >
> >
> > Any ideas, please?
> >
> > Thanks,
> >
> > Mark
> > _______________________________________________
> > users mailing list
> > users@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>



-- 
Josh Hursey
IBM Spectrum MPI Developer
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to