I am trying to map MPI processes to sockets in a somewhat compacted pattern and I am wondering the best way to do it.
Say there are 2 sockets (0 and 1) and each processor has 4 cores (0,1,2,3) and I have 4 MPI processes, each of which will use 2 OpenMP processes. I've re-ordered my parallel work such that pairs of ranks (0,1 and 2,3) communicate more with each other than with other ranks. Thus I think the best mapping would be: RANK SOCKET CORE 0 0 0 1 0 2 2 1 0 3 1 2 My understanding is that --bysocket --bind-to-socket will give me ranks 0 and 2 on socket 0 and ranks 1 and 3 on socket 1, not what I want. It looks like --cpus-per-proc might be what I want, i.e. seems like I might give the value 2. But it was unclear to me whether I would also need to give --bysocket and the FAQ suggests this combination is untested. May be a rankfile is what I need? I would appreciate some advice on the easiest way to get this mapping. Thanks