Re: [PR] [FLINK-34131] Ensure cluster.health-check.checkpoint-progress.window is well configure [flink-kubernetes-operator]

via GitHub Thu, 18 Jan 2024 05:09:39 -0800


ashangit commented on code in PR #758:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/758#discussion_r1457417705



##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/observer/ClusterHealthEvaluator.java:
##########
@@ -174,6 +175,25 @@ private boolean evaluateCheckpoints(
             return true;
         }
 
+        var completedCheckpointsCheckWindow =
+                
configuration.get(OPERATOR_CLUSTER_HEALTH_CHECK_CHECKPOINT_PROGRESS_WINDOW);
+
+        CheckpointConfig checkpointConfig = new CheckpointConfig();
+        checkpointConfig.configure(configuration);
+        var checkpointingInterval = checkpointConfig.getCheckpointInterval();
+        var checkpointingTimeout = checkpointConfig.getCheckpointTimeout();
+        var tolerationFailureNumber = 
checkpointConfig.getTolerableCheckpointFailureNumber() + 1;
+        if (completedCheckpointsCheckWindow.toMillis()
+                < Math.max(
+                        checkpointingInterval * tolerationFailureNumber,
+                        checkpointingTimeout * tolerationFailureNumber)) {
+            LOG.warn(
+                    "cluster.health-check.checkpoint-progress.window is not 
long enough. "
+                            + "It must be greater than 
max(execution.checkpointing.interval * 
execution.checkpointing.tolerable-failed-checkpoints, "
+                            + "execution.checkpointing.timeout * 
execution.checkpointing.tolerable-failed-checkpoints)");
+            return true;
+        }

Review Comment:
   Looks fine to me to default to appropriate value but I'm not sure to 
understand why defaulting to `2 * cpInterval` instead of 
`Math.max(checkpointingInterval * tolerationFailureNumber,checkpointingTimeout 
* tolerationFailureNumber)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-34131] Ensure cluster.health-check.checkpoint-progress.window is well configure [flink-kubernetes-operator]

Reply via email to