Yes, I've seen a couple of open issues mentioning this problem. I'll be using rankfile until v5.0.
Cheers On 01/03/2021 16:20, Ralph Castain via users wrote: > Sounds like a bug in that release - you may have to wait for OMPI v5.0 > for a fix. > >> On Mar 1, 2021, at 7:43 AM, Luis Cebamanos via users >> <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote: >> >> I am afraid --map-by ppr:32:socket --bind-to core --cpu-list >> 0,2,4,6... somehow conflicts internally with other policies. I have >> also tried with --cpu-set with identical results. Probably rankfile >> is my only option too. >> >> On 28/02/2021 22:44, Ralph Castain via users wrote: >>> The only way I know of to do what you want is >>> >>> --map-by ppr:32:socket --bind-to core --cpu-list 0,2,4,6,... >>> >>> where you list out the exact cpus you want to use. >>> >>> >>>> On Feb 28, 2021, at 9:58 AM, Luis Cebamanos via users >>>> <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote: >>>> >>>> I could do--map-by ppr:32:socket:PE=1 --bind-to core (output below) >>>> but I cannot see the way of mapping every 2 cores 0,2,4,.... >>>> >>>> [epsilon110:1489563] MCW rank 0 bound to socket 0[core 0[hwt >>>> 0-1]]: [BB/../../.. >>>> /../../../../../../../../../../../../../../../../../../../../../../../../../../. >>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >>>> ../../../../../../..][../../../../../../../../../../../../../../../../../../../. >>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >>>> ../../../../../../../../../../../../../../../../../..] >>>> [epsilon110:1489563] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]: >>>> [../BB/../.. >>>> /../../../../../../../../../../../../../../../../../../../../../../../../../../. >>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >>>> ../../../../../../..][../../../../../../../../../../../../../../../../../../../. >>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >>>> ../../../../../../../../../../../../../../../../../..] >>>> >>>> On 28/02/2021 16:24, Ralph Castain via users wrote: >>>>> Did you read the documentation on rankfile? The "slot=N" directive >>>>> saids to "put this proc on core N". In your file, you stipulate that >>>>> >>>>> rank 0 is to be placed solely on core 0 >>>>> rank 1 is to be placed solely on core 2 >>>>> etc. >>>>> >>>>> That is not what you asked for in your mpirun cmd. You asked that >>>>> each proc be mapped to TWO cores (PE=2) or FOUR threads (PE=4 with >>>>> bind-to HWT). If you wanted that same thing in a rankfile, it >>>>> should have said >>>>> >>>>> rank 0 slots=0-1 >>>>> rank 1 slots=2-3 >>>>> etc. >>>>> >>>>> Hence the difference. I was simply correcting your mpirun cmd line >>>>> as you said you wanted two CORES, and that isn't guaranteed if you >>>>> are stipulating things in terms of HWTs as not every machine has >>>>> two HWTs/core. >>>>> >>>>> >>>>> >>>>>> On Feb 28, 2021, at 7:43 AM, Luis Cebamanos via users >>>>>> <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote: >>>>>> >>>>>> Hi Ralph, >>>>>> >>>>>> Thanks for this, however --map-by ppr:32:socket:PE=2 --bind-to >>>>>> core reports the same binding than --map-by ppr:32:socket:PE=4 >>>>>> --bind-to hwthread: >>>>>> >>>>>> [epsilon104:2861230] MCW rank 0 bound to socket 0[core 0[hwt >>>>>> 0-1]], socket 0[core 1[hwt 0-1]]: [BB/BB/../../../../ >>>>>> ../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../ >>>>>> ../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../.. >>>>>> /../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../.. >>>>>> /../../../../../../../..] >>>>>> [epsilon104:2861230] MCW rank 1 bound to socket 0[core 2[hwt >>>>>> 0-1]], socket 0[core 3[hwt 0-1]]: [../../BB/BB/../../ >>>>>> ../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../ >>>>>> ../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../.. >>>>>> /../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../.. >>>>>> /../../../../../../../..] >>>>>> [epsilon104:2861230] MCW rank 2 bound to socket 0[core 4[hwt >>>>>> 0-1]], socket 0[core 5[hwt 0-1]]: [../../../../BB/BB/ >>>>>> ../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../ >>>>>> ../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../.. >>>>>> /../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../.. >>>>>> /../../../../../../../..] >>>>>> >>>>>> And this is still different from the output produce using the >>>>>> rankfile. >>>>>> >>>>>> Cheers, >>>>>> Luis >>>>>> >>>>>> On 28/02/2021 14:06, Ralph Castain via users wrote: >>>>>>> Your command line is incorrect: >>>>>>> >>>>>>> --map-by ppr:32:socket:PE=4 --bind-to hwthread >>>>>>> >>>>>>> should be >>>>>>> >>>>>>> --map-by ppr:32:socket:PE=2 --bind-to core >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Feb 28, 2021, at 5:57 AM, Luis Cebamanos via users >>>>>>>> <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote: >>>>>>>> >>>>>>>> I should have said, "I would like to run 128 MPI processes on 2 >>>>>>>> nodes" and not 64 like I initially said... >>>>>>>> >>>>>>>> On Sat, 27 Feb 2021, 15:03 Luis Cebamanos, <luic...@gmail.com >>>>>>>> <mailto:luic...@gmail.com>> wrote: >>>>>>>> >>>>>>>> Hello OMPI users, >>>>>>>> >>>>>>>> On 128 core nodes, 2 sockets x 64 cores/socket (2 >>>>>>>> hwthreads/core) , I am >>>>>>>> trying to match the behavior of running with a rankfile >>>>>>>> with manual >>>>>>>> mapping/ranking/binding. >>>>>>>> >>>>>>>> I would like to run 64 MPI processes on 2 nodes, 1 MPI >>>>>>>> process every 2 >>>>>>>> cores. This is, I want to run 32 MPI processes per socket >>>>>>>> on 2 128-core >>>>>>>> nodes. My mapping should be something like: >>>>>>>> >>>>>>>> Node 0 >>>>>>>> ===== >>>>>>>> rank 0 - core 0 >>>>>>>> rank 1 - core 2 >>>>>>>> rank 3 - core 4 >>>>>>>> ... >>>>>>>> rank 63 - core 126 >>>>>>>> >>>>>>>> >>>>>>>> Node 1 >>>>>>>> ==== >>>>>>>> rank 64 - core 0 >>>>>>>> rank 65 - core 2 >>>>>>>> rank 66 - core 4 >>>>>>>> ... >>>>>>>> rank 127- core 126 >>>>>>>> >>>>>>>> If I use a rankfile: >>>>>>>> rank 0=epsilon102 slot=0 >>>>>>>> rank 1=epsilon102 slot=2 >>>>>>>> rank 2=epsilon102 slot=4 >>>>>>>> rank 3=epsilon102 slot=6 >>>>>>>> rank 4=epsilon102 slot=8 >>>>>>>> rank 5=epsilon102slot=10 >>>>>>>> .... >>>>>>>> rank 123=epsilon103 slot=118 >>>>>>>> rank 124=epsilon103 slot=120 >>>>>>>> rank 125=epsilon103 slot=122 >>>>>>>> rank 126=epsilon103 slot=124 >>>>>>>> rank 127=epsilon103 slot=126 >>>>>>>> >>>>>>>> My --report-binding looks like: >>>>>>>> >>>>>>>> [epsilon102:2635370] MCW rank 0 bound to socket 0[core >>>>>>>> 0[hwt 0-1]]: >>>>>>>> [BB/../../.. >>>>>>>> >>>>>>>> /../../../../../../../../../../../../../../../../../../../../../../../../../../. >>>>>>>> >>>>>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >>>>>>>> >>>>>>>> ../../../../../../..][../../../../../../../../../../../../../../../../../../../. >>>>>>>> >>>>>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >>>>>>>> ../../../../../../../../../../../../../../../../../..] >>>>>>>> [epsilon102:2635370] MCW rank 1 bound to socket 0[core >>>>>>>> 2[hwt 0-1]]: >>>>>>>> [../../BB/.. >>>>>>>> >>>>>>>> /../../../../../../../../../../../../../../../../../../../../../../../../../../. >>>>>>>> >>>>>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >>>>>>>> >>>>>>>> ../../../../../../..][../../../../../../../../../../../../../../../../../../../. >>>>>>>> >>>>>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >>>>>>>> ../../../../../../../../../../../../../../../../../..] >>>>>>>> [epsilon102:2635370] MCW rank 2 bound to socket 0[core >>>>>>>> 4[hwt 0-1]]: >>>>>>>> [../../../.. >>>>>>>> >>>>>>>> /BB/../../../../../../../../../../../../../../../../../../../../../../../../../. >>>>>>>> >>>>>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >>>>>>>> >>>>>>>> ../../../../../../..][../../../../../../../../../../../../../../../../../../../. >>>>>>>> >>>>>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >>>>>>>> ../../../../../../../../../../../../../../../../../..] >>>>>>>> >>>>>>>> >>>>>>>> However, I cannot match this report-binding output by >>>>>>>> manually using >>>>>>>> --map-by and --bind-to. I had the impression that this will >>>>>>>> be the same: >>>>>>>> >>>>>>>> mpirun -np $SLURM_NTASKS --report-bindings --map-by >>>>>>>> ppr:32:socket:PE=4 >>>>>>>> --bind-to hwthread >>>>>>>> >>>>>>>> But this output is not quite the same: >>>>>>>> >>>>>>>> [epsilon102:2631529] MCW rank 0 bound to socket 0[core >>>>>>>> 0[hwt 0-1]], >>>>>>>> socket 0[cor >>>>>>>> e 1[hwt 0-1]]: >>>>>>>> [BB/BB/../../../../../../../../../../../../../../../../../../../. >>>>>>>> >>>>>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >>>>>>>> >>>>>>>> ../../../../../../../../../../../../../../../..][../../../../../../../../../../. >>>>>>>> >>>>>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >>>>>>>> >>>>>>>> ../../../../../../../../../../../../../../../../../../../../../../../../../../..] >>>>>>>> [epsilon102:2631529] MCW rank 1 bound to socket 0[core >>>>>>>> 2[hwt 0-1]], >>>>>>>> socket 0[cor >>>>>>>> e 3[hwt 0-1]]: >>>>>>>> [../../BB/BB/../../../../../../../../../../../../../../../../../. >>>>>>>> >>>>>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >>>>>>>> >>>>>>>> ../../../../../../../../../../../../../../../..][../../../../../../../../../../. >>>>>>>> >>>>>>>> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >>>>>>>> >>>>>>>> ../../../../../../../../../../../../../../../../../../../../../../../../../../..] >>>>>>>> >>>>>>>> What am I missing to match the rankfile behavior? Regarding >>>>>>>> performance, >>>>>>>> what difference does it make between the first and the >>>>>>>> second outputs? >>>>>>>> >>>>>>>> Thanks for your help! >>>>>>>> Luis >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >