[ https://issues.apache.org/jira/browse/FLINK-36531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17889879#comment-17889879 ]
Sai Sharath Dandi edited comment on FLINK-36531 at 10/16/24 12:11 AM: ---------------------------------------------------------------------- [~heigebupahei] I have checked the FLIP and it's exactly what we're looking for. We're interested in the future optimization to handle case of large checkpoint interval and rescale early than delay the scaling till next checkpoint. Since this will be a contribution on the scheduler side rather than Autoscaler, I will close this JIRA was (Author: JIRAUSER298466): [~heigebupahei] I have checked the FLIP and it's exactly what we're looking for. We're interested in the future optimization to handle case of large checkpoint interval and rescale early than delay the scaling till next checkpoint. I will close this JIRA > AutoScaler needs to consider the lag from last checkpoint > --------------------------------------------------------- > > Key: FLINK-36531 > URL: https://issues.apache.org/jira/browse/FLINK-36531 > Project: Flink > Issue Type: Improvement > Components: Autoscaler > Reporter: Sai Sharath Dandi > Priority: Major > > Autoscaler computes the target processing capacity as > [below|https://sg.uberinternal.com/code.uber.internal/uber-code/data-flink-kubernetes-operator@release-1.9-uber/-/blob/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/utils/AutoScalerUtils.java?L47] > // Target = LAG/CATCH_UP + INPUT_RATE*RESTART/CATCH_UP + > INPUT_RATE/TARGET_UTIL > > During the scaling action, the autoscaler will restart the job from the last > successful checkpoint, we need to include the number of processed records > since last successful checkpoint as part of the lag as those records will be > replayed after scaling. This is particularly important for jobs with long > checkpoint intervals and large state as there could be a significant > difference between the realtime lag and the lag from the checkpoint -- This message was sent by Atlassian Jira (v8.20.10#820010)