Hmmm...well, the proc distribution is easy as you would just --map-by node. The 
tricky thing is assigning the ranks in the pattern you desire. We definitely 
don't have that pattern in our ranking algo today, though it wouldn't be hard 
to add.

However, that wouldn't be available until OMPI v5 was released later this year. 
Are you going to be using this long-term enough to warrant a dedicated option?

Of course, you could fake it out even today by breaking it into multiple 
app-contexts on the cmd line. Something like this (again, shortening it to just 
two nodes):

mpirun --map-by node --rank-by slot --bind-to core --np 8 myapp : --np 8 myapp 
: --np 8 myapp : --np 8 myapp : --np 8 myapp

The way our mapper currently works, it will process the app-contexts in order. 
So I _think_ this will get what you want - might be worth a try. Kinda ugly, I 
know - but it might work, and all the app-contexts wind up in MPI_COMM_WORLD.


On Jan 28, 2021, at 3:18 PM, Luis Cebamanos via users <users@lists.open-mpi.org 
<mailto:users@lists.open-mpi.org> > wrote:

That's right Ralph!

On 28/01/2021 23:13, Ralph Castain via users wrote:
Trying to wrap my head around this, so let me try a 2-node example. You want 
(each rank bound to a single core):

ranks 0-3 to be mapped onto node1
ranks 4-7 to be mapped onto node2
ranks 8-11 to be mapped onto node1
ranks 12-15 to be mapped onto node2
etc.etc.

Correct?

On Jan 28, 2021, at 3:00 PM, Luis Cebamanos via users <users@lists.open-mpi.org 
<mailto:users@lists.open-mpi.org> > wrote:

Hello all,

What are the options for binding MPI tasks on a blocks of cores per 
node/socket/numa in a round-robin fashion? Say I want to fully populate 40 core 
sockets on dual-socket nodes but in a round-robin fashion binding 4 cores on 
the first node, then 4 cores on the next, and so on.  Would be ``--bind-to 
core``?

srun can do this with ``distribution=plane`` so one could do ``srun 
--distribution=plane=4``.

cheers





Reply via email to