[ https://issues.apache.org/jira/browse/FLINK-36531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17889404#comment-17889404 ]
yuanfenghu edited comment on FLINK-36531 at 10/15/24 2:22 AM: -------------------------------------------------------------- [~dsaisharath] There is already an optimization to solve [FLIP-461|https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler] this problem, it may be helpful to you was (Author: JIRAUSER296932): [~dsaisharath] There is already an optimization to solve t[FLIP-461|https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler]his problem, it may be helpful to you > AutoScaler needs to consider the lag from last checkpoint > --------------------------------------------------------- > > Key: FLINK-36531 > URL: https://issues.apache.org/jira/browse/FLINK-36531 > Project: Flink > Issue Type: Improvement > Components: Autoscaler > Reporter: Sai Sharath Dandi > Priority: Major > > Autoscaler computes the target processing capacity as > [below|https://sg.uberinternal.com/code.uber.internal/uber-code/data-flink-kubernetes-operator@release-1.9-uber/-/blob/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/utils/AutoScalerUtils.java?L47] > // Target = LAG/CATCH_UP + INPUT_RATE*RESTART/CATCH_UP + > INPUT_RATE/TARGET_UTIL > > During the scaling action, the autoscaler will restart the job from the last > successful checkpoint, we need to include the number of processed records > since last successful checkpoint as part of the lag as those records will be > replayed after scaling. This is particularly important for jobs with long > checkpoint intervals and large state as there could be a significant > difference between the realtime lag and the lag from the checkpoint -- This message was sent by Atlassian Jira (v8.20.10#820010)