[ 
https://issues.apache.org/jira/browse/FLINK-33977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33977:
-----------------------------------
    Labels: pull-request-available  (was: )

> Adaptive scheduler may not minimize the number of TMs during downscaling
> ------------------------------------------------------------------------
>
>                 Key: FLINK-33977
>                 URL: https://issues.apache.org/jira/browse/FLINK-33977
>             Project: Flink
>          Issue Type: Improvement
>          Components: Autoscaler, Runtime / Coordination
>    Affects Versions: 1.18.0, 1.19.0, 1.20.0
>            Reporter: Zhanghao Chen
>            Assignee: RocMarshal
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: screenshot-1.png
>
>
> Adaptive Scheduler uses SlotAssigner to assign free slots to slot sharing 
> groups. Currently, there're two implementations of SlotAssigner available: 
> the 
> DefaultSlotAssigner that treats all slots and slot sharing groups equally and 
> the {color:#172b4d}StateLocalitySlotAssigner{color} that assigns slots based 
> on the number of local key groups to utilize local state recovery. The 
> scheduler will use the DefaultSlotAssigner when no key group assignment info 
> is available and use the StateLocalitySlotAssigner otherwise.
>  
> However, none of the SlotAssigner targets at minimizing the number of TMs, 
> which may produce suboptimal slot assignment under the Application Mode. For 
> example, when a job with 8 slot sharing groups and 2 TMs (each 4 slots) is 
> downscaled through the FLIP-291 API to have 4 slot sharing groups instead, 
> the cluster may still have 2 TMs, one with 1 free slot, and the other with 3 
> free slots. For end-users, this implies an ineffective downscaling as the 
> total cluster resources are not reduced.
>  
> We should take minimizing number of TMs into consideration as well. A 
> possible approach is to enhance the {color:#172b4d}StateLocalitySlotAssigner: 
> when the number of free slots exceeds need, sort all the TMs by a score 
> summing from the allocation scores of all slots on it, remove slots from the 
> excessive TMs with the lowest score and proceed the remaining slot 
> assignment.{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to