Anton Solovev created FLINK-37354:
-------------------------------------
Summary: Kubernetes Operator HealthCheck compatibility
Key: FLINK-37354
URL: https://issues.apache.org/jira/browse/FLINK-37354
Project: Flink
Issue Type: Bug
Components: Kubernetes Operator
Affects Versions: 1.10.0
Reporter: Anton Solovev
Kubernetes Operator HealthCheck is not aligned with checkpoint interval when it
is set via java api.
{code:java}
var checkpointConfig = env.getCheckpointConfig();
checkpointConfig.setCheckpointInterval(Duration.ofHours(2).toMillis());
{code}
will lead to exceptions and therefore restarting the job manager
{noformat}
2025-01-28 10:15:32,435 o.a.f.k.o.l.AuditUtils [INFO
][flink-jobs/job-1] >>> Event[Job] | Warning | RESTARTUNHEALTHYJOB |
Restarting unhealthy job
{noformat}
nevertheless there are ways to mitigate this:
# disable
*kubernetes.operator.cluster.health-check.checkpoint-progress.enabled*
# set *kubernetes.operator.cluster.health-check.checkpoint-progress.window* to
two ours as well
# never use java api for setting checkpoint interval
--
This message was sent by Atlassian Jira
(v8.20.10#820010)