Hmmm...well, the proc distribution is easy as you would just --map-by node. The tricky thing is assigning the ranks in the pattern you desire. We definitely don't have that pattern in our ranking algo today, though it wouldn't be hard to add.
However, that wouldn't be available until OMPI v5 was released later this year. Are you going to be using this long-term enough to warrant a dedicated option? Of course, you could fake it out even today by breaking it into multiple app-contexts on the cmd line. Something like this (again, shortening it to just two nodes): mpirun --map-by node --rank-by slot --bind-to core --np 8 myapp : --np 8 myapp : --np 8 myapp : --np 8 myapp : --np 8 myapp The way our mapper currently works, it will process the app-contexts in order. So I _think_ this will get what you want - might be worth a try. Kinda ugly, I know - but it might work, and all the app-contexts wind up in MPI_COMM_WORLD. On Jan 28, 2021, at 3:18 PM, Luis Cebamanos via users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > wrote: That's right Ralph! On 28/01/2021 23:13, Ralph Castain via users wrote: Trying to wrap my head around this, so let me try a 2-node example. You want (each rank bound to a single core): ranks 0-3 to be mapped onto node1 ranks 4-7 to be mapped onto node2 ranks 8-11 to be mapped onto node1 ranks 12-15 to be mapped onto node2 etc.etc. Correct? On Jan 28, 2021, at 3:00 PM, Luis Cebamanos via users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > wrote: Hello all, What are the options for binding MPI tasks on a blocks of cores per node/socket/numa in a round-robin fashion? Say I want to fully populate 40 core sockets on dual-socket nodes but in a round-robin fashion binding 4 cores on the first node, then 4 cores on the next, and so on. Would be ``--bind-to core``? srun can do this with ``distribution=plane`` so one could do ``srun --distribution=plane=4``. cheers