Anton Solovev created FLINK-37354: ------------------------------------- Summary: Kubernetes Operator HealthCheck compatibility Key: FLINK-37354 URL: https://issues.apache.org/jira/browse/FLINK-37354 Project: Flink Issue Type: Bug Components: Kubernetes Operator Affects Versions: 1.10.0 Reporter: Anton Solovev
Kubernetes Operator HealthCheck is not aligned with checkpoint interval when it is set via java api. {code:java} var checkpointConfig = env.getCheckpointConfig(); checkpointConfig.setCheckpointInterval(Duration.ofHours(2).toMillis()); {code} will lead to exceptions and therefore restarting the job manager {noformat} 2025-01-28 10:15:32,435 o.a.f.k.o.l.AuditUtils [INFO ][flink-jobs/job-1] >>> Event[Job] | Warning | RESTARTUNHEALTHYJOB | Restarting unhealthy job {noformat} nevertheless there are ways to mitigate this: # disable *kubernetes.operator.cluster.health-check.checkpoint-progress.enabled* # set *kubernetes.operator.cluster.health-check.checkpoint-progress.window* to two ours as well # never use java api for setting checkpoint interval -- This message was sent by Atlassian Jira (v8.20.10#820010)