[jira] [Updated] (FLINK-36863) Use the maximum parallelism in the past scale-down.interval window when scaling down

Rui Fan (Jira) Sun, 08 Dec 2024 19:23:17 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-36863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rui Fan updated FLINK-36863:
----------------------------
    Description: 
FLINK-36535 uses the maximum parallelism since the scale down trigger when 
scaling down. Because VertexDelayedScaleDownInfo only stored the 
maxRecommendedParallelism [1].

It's better to use the maximum parallelism in the {color:#de350b}past 
scale-down.interval window{color}.
h1. Reason:

Assuming current parallelism is 100, and scale down interval is 1 hour, what's 
difference between them?

Following is the recommended parallelism at the different time:
 * 2024-12-09 00:00:00 -> 99 (trigger scale down)
 * 2024-12-09 00:30:00 -> 90
 * 2024-12-09 01:00:00 -> 80
 * 2024-12-09 01:30:00 -> 70
 * 2024-12-09 02:00:00 -> 60
 * 2024-12-09 02:30:00 -> 50
 * 2024-12-09 03:00:00 -> 40

For the current code in the main branch, the 99 will be as the final 
parallelism at 2024-12-09 03:10:00 since we take the maxRecommendedParallelism 
from VertexDelayedScaleDownInfo.

But it has a bug here: 99 is closer with current parallelism (100), so the 
recommended parallelism is always within the utilization range. So job or task 
never scale down.

But we should use 50 as the final parallelism at 2024-12-09 03:10:00, because 
50 is the max parallelism in the past 1 hour. And 50 is not within the 
utilization range, scale down could be executed.
h1. Approach:

VertexDelayedScaleDownInfo maintain all recommended parallelisms at each time 
within the past scale-down.interval window period.
 * Evicts the recommended parallelism before the scale-down.interval window.
 * The max parallelism within the window range as the final parallelism.

Note: It is a scenario that calculates the max value within a sliding window.
 * It is similar with leetcode 239: Sliding Window Maximum [2].
 * If latest parallelism is greater than the past parallelism, the past 
parallelism never be the max value, so we could evict the past value.
 * We only need to maintain a list with monotonically decreasing parallelism 
within the past window.
 * The first parallelism is the final parallelism.

h1. Note:

This proposal is exactly what FLINK-36535 change1 expects. But I was not aware 
of this bug during my development. Sorry for that. :(
 * {color:#de350b}Change1{color}: Using the maximum parallelism within the 
window instead of the latest parallelism when scaling down.

 

[1] 
[https://github.com/apache/flink-kubernetes-operator/blob/d9e8cce85499f26ac0129a2f2d13a083d68b5c21/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/DelayedScaleDown.java#L42]

[2] [https://leetcode.com/problems/sliding-window-maximum/description/]

  was:
FLINK-36535 uses the maximum parallelism since the scale down trigger when 
scaling down. Because VertexDelayedScaleDownInfo only stored the 
maxRecommendedParallelism [1].

It's better to use the maximum parallelism in the {color:#de350b}past 
scale-down.interval window{color}.
h1. Reason:

Assuming current parallelism is 100, and scale down interval is 1 hour, what's 
difference between them?

Following is the recommended parallelism at the different time:
 * 2024-12-09 00:00:00 -> 99 (trigger scale down)
 * 2024-12-09 00:30:00 -> 90
 * 2024-12-09 01:00:00 -> 80
 * 2024-12-09 01:30:00 -> 70
 * 2024-12-09 02:00:00 -> 60
 * 2024-12-09 02:30:00 -> 50
 * 2024-12-09 03:00:00 -> 40

For the current code in the main branch, the 99 will be as the final 
parallelism at 2024-12-09 03:10:00 since we take the maxRecommendedParallelism 
from VertexDelayedScaleDownInfo.

But it has a bug here: 99 is closer with current parallelism (100), so the 
recommended parallelism is always within the utilization range. So job or task 
never scale down.

But we should use 50 as the final parallelism at 2024-12-09 03:10:00, because 
50 is the max parallelism in the past 1 hour. And 50 is not within the 
utilization range, scale down could be executed.
h1. Solution:

VertexDelayedScaleDownInfo maintain all recommended parallelisms at each time 
within the past scale-down.interval window period.
 * Evicts the recommended parallelism before the scale-down.interval window.
 * The max parallelism within the window range as the final parallelism.

Note: It is a scenario that calculates the max value within a sliding window.
 * It is similar with leetcode 239: Sliding Window Maximum [2].
 * If latest parallelism is greater than the past parallelism, the past 
parallelism never be the max value, so we could evict the past value.
 * We only need to maintain a list with monotonically decreasing parallelism 
within the past window.
 * The first parallelism is the final parallelism.

h1. Note:

This proposal is exactly what FLINK-36535 change1 expects. But I was not aware 
of this bug during my development. Sorry for that. :(
 * {color:#de350b}Change1{color}: Using the maximum parallelism within the 
window instead of the latest parallelism when scaling down.

 

[1] 
[https://github.com/apache/flink-kubernetes-operator/blob/d9e8cce85499f26ac0129a2f2d13a083d68b5c21/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/DelayedScaleDown.java#L42]

[2] [https://leetcode.com/problems/sliding-window-maximum/description/]


> Use the maximum parallelism in the past scale-down.interval window when 
> scaling down
> ------------------------------------------------------------------------------------
>
>                 Key: FLINK-36863
>                 URL: https://issues.apache.org/jira/browse/FLINK-36863
>             Project: Flink
>          Issue Type: Bug
>          Components: Autoscaler
>            Reporter: Rui Fan
>            Assignee: Rui Fan
>            Priority: Major
>
> FLINK-36535 uses the maximum parallelism since the scale down trigger when 
> scaling down. Because VertexDelayedScaleDownInfo only stored the 
> maxRecommendedParallelism [1].
> It's better to use the maximum parallelism in the {color:#de350b}past 
> scale-down.interval window{color}.
> h1. Reason:
> Assuming current parallelism is 100, and scale down interval is 1 hour, 
> what's difference between them?
> Following is the recommended parallelism at the different time:
>  * 2024-12-09 00:00:00 -> 99 (trigger scale down)
>  * 2024-12-09 00:30:00 -> 90
>  * 2024-12-09 01:00:00 -> 80
>  * 2024-12-09 01:30:00 -> 70
>  * 2024-12-09 02:00:00 -> 60
>  * 2024-12-09 02:30:00 -> 50
>  * 2024-12-09 03:00:00 -> 40
> For the current code in the main branch, the 99 will be as the final 
> parallelism at 2024-12-09 03:10:00 since we take the 
> maxRecommendedParallelism from VertexDelayedScaleDownInfo.
> But it has a bug here: 99 is closer with current parallelism (100), so the 
> recommended parallelism is always within the utilization range. So job or 
> task never scale down.
> But we should use 50 as the final parallelism at 2024-12-09 03:10:00, because 
> 50 is the max parallelism in the past 1 hour. And 50 is not within the 
> utilization range, scale down could be executed.
> h1. Approach:
> VertexDelayedScaleDownInfo maintain all recommended parallelisms at each time 
> within the past scale-down.interval window period.
>  * Evicts the recommended parallelism before the scale-down.interval window.
>  * The max parallelism within the window range as the final parallelism.
> Note: It is a scenario that calculates the max value within a sliding window.
>  * It is similar with leetcode 239: Sliding Window Maximum [2].
>  * If latest parallelism is greater than the past parallelism, the past 
> parallelism never be the max value, so we could evict the past value.
>  * We only need to maintain a list with monotonically decreasing parallelism 
> within the past window.
>  * The first parallelism is the final parallelism.
> h1. Note:
> This proposal is exactly what FLINK-36535 change1 expects. But I was not 
> aware of this bug during my development. Sorry for that. :(
>  * {color:#de350b}Change1{color}: Using the maximum parallelism within the 
> window instead of the latest parallelism when scaling down.
>  
> [1] 
> [https://github.com/apache/flink-kubernetes-operator/blob/d9e8cce85499f26ac0129a2f2d13a083d68b5c21/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/DelayedScaleDown.java#L42]
> [2] [https://leetcode.com/problems/sliding-window-maximum/description/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-36863) Use the maximum parallelism in the past scale-down.interval window when scaling down

Reply via email to