[ https://issues.apache.org/jira/browse/FLINK-36531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sai Sharath Dandi closed FLINK-36531. ------------------------------------- Resolution: Not A Problem > AutoScaler needs to consider the lag from last checkpoint > --------------------------------------------------------- > > Key: FLINK-36531 > URL: https://issues.apache.org/jira/browse/FLINK-36531 > Project: Flink > Issue Type: Improvement > Components: Autoscaler > Reporter: Sai Sharath Dandi > Priority: Major > > Autoscaler computes the target processing capacity as > [below|https://sg.uberinternal.com/code.uber.internal/uber-code/data-flink-kubernetes-operator@release-1.9-uber/-/blob/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/utils/AutoScalerUtils.java?L47] > // Target = LAG/CATCH_UP + INPUT_RATE*RESTART/CATCH_UP + > INPUT_RATE/TARGET_UTIL > > During the scaling action, the autoscaler will restart the job from the last > successful checkpoint, we need to include the number of processed records > since last successful checkpoint as part of the lag as those records will be > replayed after scaling. This is particularly important for jobs with long > checkpoint intervals and large state as there could be a significant > difference between the realtime lag and the lag from the checkpoint -- This message was sent by Atlassian Jira (v8.20.10#820010)