Hi Flink Community,

I'm running the Flink jobs (standalone mode) with high availability in
Kubernetes (Flink version 1.17.2). The job is deployed with two job
managers. I noticed that the leader job manager would delete the ConfigMap
when the job failed and restarted. Thus the standby job manager couldn't
recover the jobId and checkpoint from the ConfigMap. And the job started
with a fresh state. While from the Flink docs
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/#high-availability-data-clean-up,
it mentions that HA related ConfigMaps would be retained and job would
recovered from the checkpoints stored in the ConfigMaps. Looks like the
Flink doesn't work as described. Are there some configs to persist the
configmap when the job fails or restarts?

During my search via Google and ChatGPT, it recommends the following 2
configs to keep the configmap during job cleanup. But I can't find any
Flink docs mentioning these configurations nor in the Flink code. Please
advise!

high-availability.cleanup-on-shutdown

or

kubernetes.jobmanager.cleanup-ha-metadata

Thanks,
Chen



-- 

Chen Yang
Software Engineer, Data Infrastructure

DoorDash.com <http://www.doordash.com/>

Reply via email to