[ https://issues.apache.org/jira/browse/FLINK-36863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rui Fan updated FLINK-36863: ---------------------------- Description: FLINK-36535 uses the maximum parallelism since the scale down trigger when scaling down. Because VertexDelayedScaleDownInfo only stored the maxRecommendedParallelism [1]. It's better to use the maximum parallelism in the {color:#de350b}past scale-down.interval window{color}. h1. Reason: Assuming current parallelism is 100, and scale down interval is 1 hour, what's difference between them? Following is the recommended parallelism at the different time: * 2024-12-09 00:00:00 -> 99 (trigger scale down) * 2024-12-09 00:30:00 -> 90 * 2024-12-09 01:00:00 -> 80 * 2024-12-09 01:30:00 -> 70 * 2024-12-09 02:00:00 -> 60 * 2024-12-09 02:30:00 -> 50 * 2024-12-09 03:00:00 -> 40 For the current code in the main branch, the 99 will be as the final parallelism at 2024-12-09 03:10:00 since we take the maxRecommendedParallelism from VertexDelayedScaleDownInfo. But it has a bug here: 99 is closer with current parallelism (100), so the recommended parallelism is always within the utilization range. So job or task never scale down. But we should use 50 as the final parallelism at 2024-12-09 03:10:00, because 50 is the max parallelism in the past 1 hour. And 50 is not within the utilization range, scale down could be executed. h1. Approach: VertexDelayedScaleDownInfo maintain all recommended parallelisms at each time within the past scale-down.interval window period. * Evicts the recommended parallelism before the scale-down.interval window. * The max parallelism within the window range as the final parallelism. Note: It is a scenario that calculates the max value within a sliding window. * It is similar with leetcode 239: Sliding Window Maximum [2]. * If latest parallelism is greater than the past parallelism, the past parallelism never be the max value, so we could evict the past value. * We only need to maintain a list with monotonically decreasing parallelism within the past window. * The first parallelism is the final parallelism. h1. Note: This proposal is exactly what FLINK-36535 change1 expects. But I was not aware of this bug during my development. Sorry for that. :( * {color:#de350b}Change1{color}: Using the maximum parallelism within the window instead of the latest parallelism when scaling down. [1] [https://github.com/apache/flink-kubernetes-operator/blob/d9e8cce85499f26ac0129a2f2d13a083d68b5c21/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/DelayedScaleDown.java#L42] [2] [https://leetcode.com/problems/sliding-window-maximum/description/] was: FLINK-36535 uses the maximum parallelism since the scale down trigger when scaling down. Because VertexDelayedScaleDownInfo only stored the maxRecommendedParallelism [1]. It's better to use the maximum parallelism in the {color:#de350b}past scale-down.interval window{color}. h1. Reason: Assuming current parallelism is 100, and scale down interval is 1 hour, what's difference between them? Following is the recommended parallelism at the different time: * 2024-12-09 00:00:00 -> 99 (trigger scale down) * 2024-12-09 00:30:00 -> 90 * 2024-12-09 01:00:00 -> 80 * 2024-12-09 01:30:00 -> 70 * 2024-12-09 02:00:00 -> 60 * 2024-12-09 02:30:00 -> 50 * 2024-12-09 03:00:00 -> 40 For the current code in the main branch, the 99 will be as the final parallelism at 2024-12-09 03:10:00 since we take the maxRecommendedParallelism from VertexDelayedScaleDownInfo. But it has a bug here: 99 is closer with current parallelism (100), so the recommended parallelism is always within the utilization range. So job or task never scale down. But we should use 50 as the final parallelism at 2024-12-09 03:10:00, because 50 is the max parallelism in the past 1 hour. And 50 is not within the utilization range, scale down could be executed. h1. Solution: VertexDelayedScaleDownInfo maintain all recommended parallelisms at each time within the past scale-down.interval window period. * Evicts the recommended parallelism before the scale-down.interval window. * The max parallelism within the window range as the final parallelism. Note: It is a scenario that calculates the max value within a sliding window. * It is similar with leetcode 239: Sliding Window Maximum [2]. * If latest parallelism is greater than the past parallelism, the past parallelism never be the max value, so we could evict the past value. * We only need to maintain a list with monotonically decreasing parallelism within the past window. * The first parallelism is the final parallelism. h1. Note: This proposal is exactly what FLINK-36535 change1 expects. But I was not aware of this bug during my development. Sorry for that. :( * {color:#de350b}Change1{color}: Using the maximum parallelism within the window instead of the latest parallelism when scaling down. [1] [https://github.com/apache/flink-kubernetes-operator/blob/d9e8cce85499f26ac0129a2f2d13a083d68b5c21/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/DelayedScaleDown.java#L42] [2] [https://leetcode.com/problems/sliding-window-maximum/description/] > Use the maximum parallelism in the past scale-down.interval window when > scaling down > ------------------------------------------------------------------------------------ > > Key: FLINK-36863 > URL: https://issues.apache.org/jira/browse/FLINK-36863 > Project: Flink > Issue Type: Bug > Components: Autoscaler > Reporter: Rui Fan > Assignee: Rui Fan > Priority: Major > > FLINK-36535 uses the maximum parallelism since the scale down trigger when > scaling down. Because VertexDelayedScaleDownInfo only stored the > maxRecommendedParallelism [1]. > It's better to use the maximum parallelism in the {color:#de350b}past > scale-down.interval window{color}. > h1. Reason: > Assuming current parallelism is 100, and scale down interval is 1 hour, > what's difference between them? > Following is the recommended parallelism at the different time: > * 2024-12-09 00:00:00 -> 99 (trigger scale down) > * 2024-12-09 00:30:00 -> 90 > * 2024-12-09 01:00:00 -> 80 > * 2024-12-09 01:30:00 -> 70 > * 2024-12-09 02:00:00 -> 60 > * 2024-12-09 02:30:00 -> 50 > * 2024-12-09 03:00:00 -> 40 > For the current code in the main branch, the 99 will be as the final > parallelism at 2024-12-09 03:10:00 since we take the > maxRecommendedParallelism from VertexDelayedScaleDownInfo. > But it has a bug here: 99 is closer with current parallelism (100), so the > recommended parallelism is always within the utilization range. So job or > task never scale down. > But we should use 50 as the final parallelism at 2024-12-09 03:10:00, because > 50 is the max parallelism in the past 1 hour. And 50 is not within the > utilization range, scale down could be executed. > h1. Approach: > VertexDelayedScaleDownInfo maintain all recommended parallelisms at each time > within the past scale-down.interval window period. > * Evicts the recommended parallelism before the scale-down.interval window. > * The max parallelism within the window range as the final parallelism. > Note: It is a scenario that calculates the max value within a sliding window. > * It is similar with leetcode 239: Sliding Window Maximum [2]. > * If latest parallelism is greater than the past parallelism, the past > parallelism never be the max value, so we could evict the past value. > * We only need to maintain a list with monotonically decreasing parallelism > within the past window. > * The first parallelism is the final parallelism. > h1. Note: > This proposal is exactly what FLINK-36535 change1 expects. But I was not > aware of this bug during my development. Sorry for that. :( > * {color:#de350b}Change1{color}: Using the maximum parallelism within the > window instead of the latest parallelism when scaling down. > > [1] > [https://github.com/apache/flink-kubernetes-operator/blob/d9e8cce85499f26ac0129a2f2d13a083d68b5c21/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/DelayedScaleDown.java#L42] > [2] [https://leetcode.com/problems/sliding-window-maximum/description/] -- This message was sent by Atlassian Jira (v8.20.10#820010)