Re: [OMPI users] Mapping, binding and ranking

Luis Cebamanos via users Mon, 01 Mar 2021 08:33:03 -0800

Yes, I've seen a couple of open issues mentioning this problem. I'll be
using rankfile until v5.0.


Cheers

On 01/03/2021 16:20, Ralph Castain via users wrote:
> Sounds like a bug in that release - you may have to wait for OMPI v5.0
> for a fix.
>
>> On Mar 1, 2021, at 7:43 AM, Luis Cebamanos via users
>> <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote:
>>
>> I am afraid --map-by ppr:32:socket --bind-to core --cpu-list
>> 0,2,4,6... somehow conflicts internally with other policies. I have
>> also tried with --cpu-set with identical results. Probably rankfile
>> is my only option too.
>>
>> On 28/02/2021 22:44, Ralph Castain via users wrote:
>>> The only way I know of to do what you want is
>>>
>>> --map-by ppr:32:socket --bind-to core --cpu-list 0,2,4,6,...
>>>
>>> where you list out the exact cpus you want to use.
>>>
>>>
>>>> On Feb 28, 2021, at 9:58 AM, Luis Cebamanos via users
>>>> <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote:
>>>>
>>>> I could do--map-by ppr:32:socket:PE=1 --bind-to core (output below)
>>>> but I cannot see the way of mapping every 2 cores 0,2,4,....
>>>>
>>>>  [epsilon110:1489563] MCW rank 0 bound to socket 0[core 0[hwt
>>>> 0-1]]: [BB/../../..
>>>> /../../../../../../../../../../../../../../../../../../../../../../../../../../.
>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>>>> ../../../../../../..][../../../../../../../../../../../../../../../../../../../.
>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>>>> ../../../../../../../../../../../../../../../../../..]
>>>> [epsilon110:1489563] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]:
>>>> [../BB/../..
>>>> /../../../../../../../../../../../../../../../../../../../../../../../../../../.
>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>>>> ../../../../../../..][../../../../../../../../../../../../../../../../../../../.
>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>>>> ../../../../../../../../../../../../../../../../../..]
>>>>
>>>> On 28/02/2021 16:24, Ralph Castain via users wrote:
>>>>> Did you read the documentation on rankfile? The "slot=N" directive
>>>>> saids to "put this proc on core N". In your file, you stipulate that
>>>>>
>>>>> rank 0 is to be placed solely on core 0
>>>>> rank 1 is to be placed solely on core 2
>>>>> etc.
>>>>>
>>>>> That is not what you asked for in your mpirun cmd. You asked that
>>>>> each proc be mapped to TWO cores (PE=2) or FOUR threads (PE=4 with
>>>>> bind-to HWT). If you wanted that same thing in a rankfile, it
>>>>> should have said
>>>>>
>>>>> rank 0 slots=0-1
>>>>> rank 1 slots=2-3
>>>>> etc.
>>>>>
>>>>> Hence the difference. I was simply correcting your mpirun cmd line
>>>>> as you said you wanted two CORES, and that isn't guaranteed if you
>>>>> are stipulating things in terms of HWTs as not every machine has
>>>>> two HWTs/core.
>>>>>
>>>>>
>>>>>
>>>>>> On Feb 28, 2021, at 7:43 AM, Luis Cebamanos via users
>>>>>> <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote:
>>>>>>
>>>>>> Hi Ralph,
>>>>>>
>>>>>> Thanks for this, however --map-by ppr:32:socket:PE=2 --bind-to
>>>>>> core reports the same binding than --map-by ppr:32:socket:PE=4
>>>>>> --bind-to hwthread:
>>>>>>
>>>>>> [epsilon104:2861230] MCW rank 0 bound to socket 0[core 0[hwt
>>>>>> 0-1]], socket 0[core 1[hwt 0-1]]: [BB/BB/../../../../
>>>>>> ../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../
>>>>>> ../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../..
>>>>>> /../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../..
>>>>>> /../../../../../../../..]
>>>>>> [epsilon104:2861230] MCW rank 1 bound to socket 0[core 2[hwt
>>>>>> 0-1]], socket 0[core 3[hwt 0-1]]: [../../BB/BB/../../
>>>>>> ../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../
>>>>>> ../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../..
>>>>>> /../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../..
>>>>>> /../../../../../../../..]
>>>>>> [epsilon104:2861230] MCW rank 2 bound to socket 0[core 4[hwt
>>>>>> 0-1]], socket 0[core 5[hwt 0-1]]: [../../../../BB/BB/
>>>>>> ../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../
>>>>>> ../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../..
>>>>>> /../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../..
>>>>>> /../../../../../../../..]
>>>>>>
>>>>>> And this is still different from the output produce using the
>>>>>> rankfile.
>>>>>>
>>>>>> Cheers,
>>>>>> Luis
>>>>>>
>>>>>> On 28/02/2021 14:06, Ralph Castain via users wrote:
>>>>>>> Your command line is incorrect:
>>>>>>>
>>>>>>> --map-by ppr:32:socket:PE=4 --bind-to hwthread
>>>>>>>
>>>>>>> should be
>>>>>>>
>>>>>>> --map-by ppr:32:socket:PE=2 --bind-to core
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On Feb 28, 2021, at 5:57 AM, Luis Cebamanos via users
>>>>>>>> <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote:
>>>>>>>>
>>>>>>>> I should have said, "I would like to run 128 MPI processes on 2
>>>>>>>> nodes" and not 64 like I initially said...
>>>>>>>>
>>>>>>>> On Sat, 27 Feb 2021, 15:03 Luis Cebamanos, <luic...@gmail.com
>>>>>>>> <mailto:luic...@gmail.com>> wrote:
>>>>>>>>
>>>>>>>>     Hello OMPI users,
>>>>>>>>
>>>>>>>>     On 128 core nodes, 2 sockets x 64 cores/socket (2
>>>>>>>>     hwthreads/core) , I am
>>>>>>>>     trying to match the behavior of running with a rankfile
>>>>>>>>     with manual
>>>>>>>>     mapping/ranking/binding.
>>>>>>>>
>>>>>>>>     I would like to run 64 MPI processes on 2 nodes, 1 MPI
>>>>>>>>     process every 2
>>>>>>>>     cores. This is, I want to run 32 MPI processes per socket
>>>>>>>>     on 2 128-core
>>>>>>>>     nodes. My mapping should be something like:
>>>>>>>>
>>>>>>>>     Node 0
>>>>>>>>     =====
>>>>>>>>     rank 0  -  core 0
>>>>>>>>     rank 1  -  core 2
>>>>>>>>     rank 3 -   core 4
>>>>>>>>     ...
>>>>>>>>     rank 63 - core 126
>>>>>>>>
>>>>>>>>
>>>>>>>>     Node 1
>>>>>>>>     ====
>>>>>>>>     rank 64  -  core 0
>>>>>>>>     rank 65  -  core 2
>>>>>>>>     rank 66 -   core 4
>>>>>>>>     ...
>>>>>>>>     rank 127- core 126
>>>>>>>>
>>>>>>>>     If I use a rankfile:
>>>>>>>>     rank 0=epsilon102 slot=0
>>>>>>>>     rank 1=epsilon102 slot=2
>>>>>>>>     rank 2=epsilon102 slot=4
>>>>>>>>     rank 3=epsilon102 slot=6
>>>>>>>>     rank 4=epsilon102 slot=8
>>>>>>>>     rank 5=epsilon102slot=10
>>>>>>>>     ....
>>>>>>>>     rank 123=epsilon103 slot=118
>>>>>>>>     rank 124=epsilon103 slot=120
>>>>>>>>     rank 125=epsilon103 slot=122
>>>>>>>>     rank 126=epsilon103 slot=124
>>>>>>>>     rank 127=epsilon103 slot=126
>>>>>>>>
>>>>>>>>     My --report-binding looks like:
>>>>>>>>
>>>>>>>>     [epsilon102:2635370] MCW rank 0 bound to socket 0[core
>>>>>>>>     0[hwt 0-1]]:
>>>>>>>>     [BB/../../..
>>>>>>>>     
>>>>>>>> /../../../../../../../../../../../../../../../../../../../../../../../../../../.
>>>>>>>>     
>>>>>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>>>>>>>>     
>>>>>>>> ../../../../../../..][../../../../../../../../../../../../../../../../../../../.
>>>>>>>>     
>>>>>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>>>>>>>>     ../../../../../../../../../../../../../../../../../..]
>>>>>>>>     [epsilon102:2635370] MCW rank 1 bound to socket 0[core
>>>>>>>>     2[hwt 0-1]]:
>>>>>>>>     [../../BB/..
>>>>>>>>     
>>>>>>>> /../../../../../../../../../../../../../../../../../../../../../../../../../../.
>>>>>>>>     
>>>>>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>>>>>>>>     
>>>>>>>> ../../../../../../..][../../../../../../../../../../../../../../../../../../../.
>>>>>>>>     
>>>>>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>>>>>>>>     ../../../../../../../../../../../../../../../../../..]
>>>>>>>>     [epsilon102:2635370] MCW rank 2 bound to socket 0[core
>>>>>>>>     4[hwt 0-1]]:
>>>>>>>>     [../../../..
>>>>>>>>     
>>>>>>>> /BB/../../../../../../../../../../../../../../../../../../../../../../../../../.
>>>>>>>>     
>>>>>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>>>>>>>>     
>>>>>>>> ../../../../../../..][../../../../../../../../../../../../../../../../../../../.
>>>>>>>>     
>>>>>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>>>>>>>>     ../../../../../../../../../../../../../../../../../..]
>>>>>>>>
>>>>>>>>
>>>>>>>>     However, I cannot match this report-binding output by
>>>>>>>>     manually using
>>>>>>>>     --map-by and --bind-to. I had the impression that this will
>>>>>>>>     be the same:
>>>>>>>>
>>>>>>>>     mpirun -np $SLURM_NTASKS  --report-bindings --map-by
>>>>>>>>     ppr:32:socket:PE=4
>>>>>>>>     --bind-to hwthread
>>>>>>>>
>>>>>>>>     But this output is not quite the same:
>>>>>>>>
>>>>>>>>     [epsilon102:2631529] MCW rank 0 bound to socket 0[core
>>>>>>>>     0[hwt 0-1]],
>>>>>>>>     socket 0[cor
>>>>>>>>     e 1[hwt 0-1]]:
>>>>>>>>     [BB/BB/../../../../../../../../../../../../../../../../../../../.
>>>>>>>>     
>>>>>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>>>>>>>>     
>>>>>>>> ../../../../../../../../../../../../../../../..][../../../../../../../../../../.
>>>>>>>>     
>>>>>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>>>>>>>>     
>>>>>>>> ../../../../../../../../../../../../../../../../../../../../../../../../../../..]
>>>>>>>>     [epsilon102:2631529] MCW rank 1 bound to socket 0[core
>>>>>>>>     2[hwt 0-1]],
>>>>>>>>     socket 0[cor
>>>>>>>>     e 3[hwt 0-1]]:
>>>>>>>>     [../../BB/BB/../../../../../../../../../../../../../../../../../.
>>>>>>>>     
>>>>>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>>>>>>>>     
>>>>>>>> ../../../../../../../../../../../../../../../..][../../../../../../../../../../.
>>>>>>>>     
>>>>>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>>>>>>>>     
>>>>>>>> ../../../../../../../../../../../../../../../../../../../../../../../../../../..]
>>>>>>>>
>>>>>>>>     What am I missing to match the rankfile behavior? Regarding
>>>>>>>>     performance,
>>>>>>>>     what difference does it make between the first and the
>>>>>>>>     second outputs?
>>>>>>>>
>>>>>>>>     Thanks for your help!
>>>>>>>>     Luis
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: [OMPI users] Mapping, binding and ranking

Reply via email to