[jira] [Created] (FLINK-37354) Kubernetes Operator HealthCheck compatibility

Anton Solovev (Jira) Wed, 19 Feb 2025 05:20:32 -0800

Anton Solovev created FLINK-37354:
-------------------------------------

             Summary: Kubernetes Operator HealthCheck compatibility
                 Key: FLINK-37354
                 URL: https://issues.apache.org/jira/browse/FLINK-37354
             Project: Flink
          Issue Type: Bug
          Components: Kubernetes Operator
    Affects Versions: 1.10.0
            Reporter: Anton Solovev



Kubernetes Operator HealthCheck is not aligned with checkpoint interval when it 
is set via java api.
{code:java}
var checkpointConfig = env.getCheckpointConfig();
checkpointConfig.setCheckpointInterval(Duration.ofHours(2).toMillis());
{code}
will lead to exceptions and therefore restarting the job manager
{noformat}
2025-01-28 10:15:32,435 o.a.f.k.o.l.AuditUtils         [INFO 
][flink-jobs/job-1] >>> Event[Job]       | Warning | RESTARTUNHEALTHYJOB | 
Restarting unhealthy job
{noformat}
nevertheless there are ways to mitigate this:
 # disable 
*kubernetes.operator.cluster.health-check.checkpoint-progress.enabled*

 # set *kubernetes.operator.cluster.health-check.checkpoint-progress.window* to 
two ours as well
 # never use java api for setting checkpoint interval



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37354) Kubernetes Operator HealthCheck compatibility

Reply via email to