[jira] [Comment Edited] (FLINK-36531) AutoScaler needs to consider the lag from last checkpoint

yuanfenghu (Jira) Mon, 14 Oct 2024 19:23:05 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-36531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17889404#comment-17889404
 ]


yuanfenghu edited comment on FLINK-36531 at 10/15/24 2:22 AM:
--------------------------------------------------------------

[~dsaisharath]
There is already an optimization  to solve 
[FLIP-461|https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler]
 this problem, it may be helpful to you
 


was (Author: JIRAUSER296932):
[~dsaisharath]
There is already an optimization  to solve 
t[FLIP-461|https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler]his
 problem, it may be helpful to you
 

> AutoScaler needs to consider the lag from last checkpoint
> ---------------------------------------------------------
>
>                 Key: FLINK-36531
>                 URL: https://issues.apache.org/jira/browse/FLINK-36531
>             Project: Flink
>          Issue Type: Improvement
>          Components: Autoscaler
>            Reporter: Sai Sharath Dandi
>            Priority: Major
>
> Autoscaler computes the target processing capacity as 
> [below|https://sg.uberinternal.com/code.uber.internal/uber-code/data-flink-kubernetes-operator@release-1.9-uber/-/blob/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/utils/AutoScalerUtils.java?L47]
> // Target = LAG/CATCH_UP + INPUT_RATE*RESTART/CATCH_UP + 
> INPUT_RATE/TARGET_UTIL
>  
> During the scaling action, the autoscaler will restart the job from the last 
> successful checkpoint, we need to include the number of processed records 
> since last successful checkpoint as part of the lag as those records will be 
> replayed after scaling. This is particularly important for jobs with long 
> checkpoint intervals and large state as there could be a significant 
> difference between the realtime lag and the lag from the checkpoint



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-36531) AutoScaler needs to consider the lag from last checkpoint

Reply via email to