[jira] [Updated] (FLINK-37354) Kubernetes Operator HealthCheck compatibility

Anton Solovev (Jira) Wed, 19 Feb 2025 05:45:30 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-37354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Anton Solovev updated FLINK-37354:
----------------------------------
    Description: 
Kubernetes Operator HealthCheck is not aligned with checkpoint interval when it 
is set via java api.
{code:java}
var checkpointConfig = env.getCheckpointConfig();
checkpointConfig.setCheckpointInterval(Duration.ofHours(2).toMillis());
{code}
will lead to exceptions and therefore restarting the job manager
{noformat}
2025-01-28 10:15:32,435 o.a.f.k.o.l.AuditUtils         [INFO 
][flink-jobs/job-1] >>> Event[Job]       | Warning | RESTARTUNHEALTHYJOB | 
Restarting unhealthy job
{noformat}
nevertheless there are ways to mitigate this:
 # disable 
*kubernetes.operator.cluster.health-check.checkpoint-progress.enabled*
 # set *kubernetes.operator.cluster.health-check.checkpoint-progress.window* to 
two ours as well
 # never use java api for setting checkpoint interval

  was:
Kubernetes Operator HealthCheck is not aligned with checkpoint interval when it 
is set via java api.
{code:java}
var checkpointConfig = env.getCheckpointConfig();
checkpointConfig.setCheckpointInterval(Duration.ofHours(2).toMillis());
{code}
will lead to exceptions and therefore restarting the job manager
{noformat}
2025-01-28 10:15:32,435 o.a.f.k.o.l.AuditUtils         [INFO 
][flink-jobs/job-1] >>> Event[Job]       | Warning | RESTARTUNHEALTHYJOB | 
Restarting unhealthy job
{noformat}
nevertheless there are ways to mitigate this:
 # disable 
*kubernetes.operator.cluster.health-check.checkpoint-progress.enabled*

 # set *kubernetes.operator.cluster.health-check.checkpoint-progress.window* to 
two ours as well
 # never use java api for setting checkpoint interval


> Kubernetes Operator HealthCheck compatibility
> ---------------------------------------------
>
>                 Key: FLINK-37354
>                 URL: https://issues.apache.org/jira/browse/FLINK-37354
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>    Affects Versions: 1.10.0
>            Reporter: Anton Solovev
>            Priority: Minor
>
> Kubernetes Operator HealthCheck is not aligned with checkpoint interval when 
> it is set via java api.
> {code:java}
> var checkpointConfig = env.getCheckpointConfig();
> checkpointConfig.setCheckpointInterval(Duration.ofHours(2).toMillis());
> {code}
> will lead to exceptions and therefore restarting the job manager
> {noformat}
> 2025-01-28 10:15:32,435 o.a.f.k.o.l.AuditUtils         [INFO 
> ][flink-jobs/job-1] >>> Event[Job]       | Warning | RESTARTUNHEALTHYJOB | 
> Restarting unhealthy job
> {noformat}
> nevertheless there are ways to mitigate this:
>  # disable 
> *kubernetes.operator.cluster.health-check.checkpoint-progress.enabled*
>  # set *kubernetes.operator.cluster.health-check.checkpoint-progress.window* 
> to two ours as well
>  # never use java api for setting checkpoint interval



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-37354) Kubernetes Operator HealthCheck compatibility

Reply via email to