Re: [PR] [FLINK-34178][autoscaler] Fix the bug that observed scaling restart time is always great than `stabilization.interval` [flink-kubernetes-operator]

via GitHub Fri, 26 Jan 2024 07:31:02 -0800


mxm commented on code in PR #759:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/759#discussion_r1467799834



##########
flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobAutoScalerImpl.java:
##########
@@ -160,14 +160,23 @@ private void runScalingLogic(Context ctx, 
AutoscalerFlinkMetrics autoscalerMetri
         var collectedMetrics = metricsCollector.updateMetrics(ctx, stateStore);
         var jobTopology = collectedMetrics.getJobTopology();
 
+        var now = clock.instant();

Review Comment:
   >I just checked and, unfortunately, our assumption does not seem to hold 
true with that regard. The loop is not triggered when the job state changes to 
RUNNING. I tried setting a higher reconciliation interval and it directly is 
reflected in the recorded durations.
   
   There is no strict guarantee how often we are being called, but in my tests 
I saw 10-20 seconds due to cluster events arriving. In any case, the main issue 
here is not the reconciliation interval but that we are using the current 
instant instead of the job update time. Let's address this as proposed by Rui 
and we should be good to go. This feature might then be robust enough to be 
enabled out of the box.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-34178][autoscaler] Fix the bug that observed scaling restart time is always great than `stabilization.interval` [flink-kubernetes-operator]

Reply via email to