Flink k8 HA mode + checkpoint management

Harsh Shah Tue, 03 Aug 2021 11:44:03 -0700

Hello,

I am trying to use Flink HA mode inside kubernetes
<https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/>
 in standalone
<https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/overview/#application-mode>
mode.
The Job ID is always constant, "00000000000000000000000000000000". In
situation where we restart the job (Not from a check-point or savepoint),
we see errors like
"""


Caused by: org.apache.hadoop.fs.FileAlreadyExistsException:
'<PATH>/flink-checkpoints/00000000000000000000000000000000/chk-1/_metadata'
already exists

"""
where checkpoints have not been created since the restart of Job .

My question:
* Is the recommended way to set a new unique "checkpoint path" every time
we update Job and restart necessary k8 resources (say not restarted from
checkpoint-savepoint)? Or GC checkpoints during deletion and reload from
savepoint if required? Looking for a standard recommendation.
* Is there a way I can override the JobID to be unique and indicate it is a
complete restart in HA mode?


Thanks,
Harsh

Flink k8 HA mode + checkpoint management

Reply via email to