I am trying to map MPI processes to sockets in a somewhat compacted pattern and 
I am wondering the best way to do it.

Say there are 2 sockets (0 and 1) and each processor has 4 cores (0,1,2,3) and 
I have 4 MPI processes, each of which will use 2 OpenMP processes.

I've re-ordered my parallel work such that pairs of ranks (0,1 and 2,3) 
communicate more with each other than with other ranks.  Thus I think the best 
mapping would be:

RANK   SOCKET    CORE
0              0              0
1              0              2
2              1              0
3              1              2

My understanding is that --bysocket --bind-to-socket will give me ranks 0 and 2 
on socket 0 and ranks 1 and 3 on socket 1, not what I want.

It looks like --cpus-per-proc might be what I want, i.e. seems like I might 
give the value 2.  But it was unclear to me whether I would also need to give 
--bysocket and the FAQ suggests this combination is untested.

May be a rankfile is what I need?

I would appreciate some advice on the easiest way to get this mapping.

Thanks

Reply via email to