[ https://issues.apache.org/jira/browse/FLINK-33977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
RocMarshal updated FLINK-33977: ------------------------------- Attachment: screenshot-1.png > Adaptive scheduler may not minimize the number of TMs during downscaling > ------------------------------------------------------------------------ > > Key: FLINK-33977 > URL: https://issues.apache.org/jira/browse/FLINK-33977 > Project: Flink > Issue Type: Improvement > Components: Autoscaler, Runtime / Coordination > Affects Versions: 1.18.0, 1.19.0, 1.20.0 > Reporter: Zhanghao Chen > Priority: Major > Attachments: screenshot-1.png > > > Adaptive Scheduler uses SlotAssigner to assign free slots to slot sharing > groups. Currently, there're two implementations of SlotAssigner available: > the > DefaultSlotAssigner that treats all slots and slot sharing groups equally and > the {color:#172b4d}StateLocalitySlotAssigner{color} that assigns slots based > on the number of local key groups to utilize local state recovery. The > scheduler will use the DefaultSlotAssigner when no key group assignment info > is available and use the StateLocalitySlotAssigner otherwise. > > However, none of the SlotAssigner targets at minimizing the number of TMs, > which may produce suboptimal slot assignment under the Application Mode. For > example, when a job with 8 slot sharing groups and 2 TMs (each 4 slots) is > downscaled through the FLIP-291 API to have 4 slot sharing groups instead, > the cluster may still have 2 TMs, one with 1 free slot, and the other with 3 > free slots. For end-users, this implies an ineffective downscaling as the > total cluster resources are not reduced. > > We should take minimizing number of TMs into consideration as well. A > possible approach is to enhance the {color:#172b4d}StateLocalitySlotAssigner: > when the number of free slots exceeds need, sort all the TMs by a score > summing from the allocation scores of all slots on it, remove slots from the > excessive TMs with the lowest score and proceed the remaining slot > assignment.{color} -- This message was sent by Atlassian Jira (v8.20.10#820010)