Hi Yang, When the job failed temporarily, e.g. due to single machine failure, Flink will retain the HA metadata and try to recover. However, when the job has already reached the terminal failed status (controlled by the restart strategy [1]), Flink will delete all metadata and exit. In your case, you might want to revise the restart strategy of the job to avoid entering the terminal failed status too quickly.
The two options are apocryphal. Don't trust LLMs too much :) [1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/task_failure_recovery/#restart-strategies Best, Zhanghao Chen ________________________________ From: Chen Yang via user <user@flink.apache.org> Sent: Wednesday, February 5, 2025 7:17 To: user@flink.apache.org <user@flink.apache.org> Cc: Vignesh Chandramohan <vignesh.chandramo...@doordash.com> Subject: Flink High Availability Data Cleanup Hi Flink Community, I'm running the Flink jobs (standalone mode) with high availability in Kubernetes (Flink version 1.17.2). The job is deployed with two job managers. I noticed that the leader job manager would delete the ConfigMap when the job failed and restarted. Thus the standby job manager couldn't recover the jobId and checkpoint from the ConfigMap. And the job started with a fresh state. While from the Flink docs https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/#high-availability-data-clean-up, it mentions that HA related ConfigMaps would be retained and job would recovered from the checkpoints stored in the ConfigMaps. Looks like the Flink doesn't work as described. Are there some configs to persist the configmap when the job fails or restarts? During my search via Google and ChatGPT, it recommends the following 2 configs to keep the configmap during job cleanup. But I can't find any Flink docs mentioning these configurations nor in the Flink code. Please advise! high-availability.cleanup-on-shutdown or kubernetes.jobmanager.cleanup-ha-metadata Thanks, Chen -- [https://s3.us-west-2.amazonaws.com/doordash-static/media/email-signatures/doordash-icon.png] Chen Yang Software Engineer, Data Infrastructure DoorDash.com<http://www.doordash.com/>