Wirawan Purwanto <wiraw...@gmail.com> writes:

> Instead of the scenario above, I was trying to get the MPI processes
> side-by-side (more like "fill_up" policy in SGE scheduler), i.e. fill
> node 0 first, then fill node 1, and so on. How do I do this properly?
>
> I tried a few attempts that fail:
>
> $ export OMP_NUM_THREADS=2
> $ mpirun -np 16 -map-by core:PE=2 ./EXECUTABLE

...

> Clearly I am not understanding how this map-by works. Could somebody
> help me? There was a wiki article partially written:
>
> https://github.com/open-mpi/ompi/wiki/ProcessPlacement
>
> but unfortunately it is also not clear to me.

Me neither; this stuff has traditionally been quite unclear and really
needs documenting/explaining properly.

This sort of thing from my local instructions for OMPI 1.8 probably does
what you want for OMP_NUM_THREADS=2 (where the qrsh options just get me
a couple of small nodes):

  $ qrsh -pe mpi 24 -l num_proc=12 \
     mpirun -n 12 --map-by slot:PE=2 --bind-to core --report-bindings true |&
     sort -k 4 -n
  [comp544:03093] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 
1[hwt 0]]: [B/B/./././.][./././././.]
  [comp544:03093] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket 0[core 
3[hwt 0]]: [././B/B/./.][./././././.]
  [comp544:03093] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket 0[core 
5[hwt 0]]: [././././B/B][./././././.]
  [comp544:03093] MCW rank 3 bound to socket 1[core 6[hwt 0]], socket 1[core 
7[hwt 0]]: [./././././.][B/B/./././.]
  [comp544:03093] MCW rank 4 bound to socket 1[core 8[hwt 0]], socket 1[core 
9[hwt 0]]: [./././././.][././B/B/./.]
  [comp544:03093] MCW rank 5 bound to socket 1[core 10[hwt 0]], socket 1[core 
11[hwt 0]]: [./././././.][././././B/B]
  [comp527:03056] MCW rank 6 bound to socket 0[core 0[hwt 0]], socket 0[core 
1[hwt 0]]: [B/B/./././.][./././././.]
  [comp527:03056] MCW rank 7 bound to socket 0[core 2[hwt 0]], socket 0[core 
3[hwt 0]]: [././B/B/./.][./././././.]
  [comp527:03056] MCW rank 8 bound to socket 0[core 4[hwt 0]], socket 0[core 
5[hwt 0]]: [././././B/B][./././././.]
  [comp527:03056] MCW rank 9 bound to socket 1[core 6[hwt 0]], socket 1[core 
7[hwt 0]]: [./././././.][B/B/./././.]
  [comp527:03056] MCW rank 10 bound to socket 1[core 8[hwt 0]], socket 1[core 
9[hwt 0]]: [./././././.][././B/B/./.]
  [comp527:03056] MCW rank 11 bound to socket 1[core 10[hwt 0]], socket 1[core 
11[hwt 0]]: [./././././.][././././B/B]

I don't remember how I found that out.
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to