[jira] [Updated] (FLINK-36531) AutoScaler needs to consider the lag from last checkpoint

Sai Sharath Dandi (Jira) Mon, 14 Oct 2024 10:48:12 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-36531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sai Sharath Dandi updated FLINK-36531:
--------------------------------------
    Description: 
Autoscaler computes the target processing capacity as 
[below|https://sg.uberinternal.com/code.uber.internal/uber-code/data-flink-kubernetes-operator@release-1.9-uber/-/blob/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/utils/AutoScalerUtils.java?L47]

// Target = LAG/CATCH_UP + INPUT_RATE*RESTART/CATCH_UP + INPUT_RATE/TARGET_UTIL

 

During the scaling action, the autoscaler will restart the job from the last 
successful checkpoint, we need to include the number of processed records since 
last successful checkpoint as part of the lag as those records will be replayed 
after scaling. This is particularly important for jobs with long checkpoint 
intervals and large state as there could be a significant difference between 
the realtime lag and the lag from the checkpoint

  was:
Autoscaler computes the target processing capacity as 
[below|https://sg.uberinternal.com/code.uber.internal/uber-code/data-flink-kubernetes-operator@release-1.9-uber/-/blob/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/utils/AutoScalerUtils.java?L47]

// Target = LAG/CATCH_UP + INPUT_RATE*RESTART/CATCH_UP + INPUT_RATE/TARGET_UTIL



 

During the scaling action, the autoscaler will start from the last successful 
checkpoint, we need to include the number of processed records since last 
successful checkpoint as part of the lag as those records will be replayed 
after scaling.


> AutoScaler needs to consider the lag from last checkpoint
> ---------------------------------------------------------
>
>                 Key: FLINK-36531
>                 URL: https://issues.apache.org/jira/browse/FLINK-36531
>             Project: Flink
>          Issue Type: Improvement
>          Components: Autoscaler
>            Reporter: Sai Sharath Dandi
>            Priority: Major
>
> Autoscaler computes the target processing capacity as 
> [below|https://sg.uberinternal.com/code.uber.internal/uber-code/data-flink-kubernetes-operator@release-1.9-uber/-/blob/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/utils/AutoScalerUtils.java?L47]
> // Target = LAG/CATCH_UP + INPUT_RATE*RESTART/CATCH_UP + 
> INPUT_RATE/TARGET_UTIL
>  
> During the scaling action, the autoscaler will restart the job from the last 
> successful checkpoint, we need to include the number of processed records 
> since last successful checkpoint as part of the lag as those records will be 
> replayed after scaling. This is particularly important for jobs with long 
> checkpoint intervals and large state as there could be a significant 
> difference between the realtime lag and the lag from the checkpoint



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-36531) AutoScaler needs to consider the lag from last checkpoint

Reply via email to