Yang Wang created FLINK-19544: --------------------------------- Summary: Implement CheckpointRecoveryFactory based on Kubernetes API Key: FLINK-19544 URL: https://issues.apache.org/jira/browse/FLINK-19544 Project: Flink Issue Type: Sub-task Components: Deployment / Kubernetes, Runtime / Checkpointing Reporter: Yang Wang Fix For: 1.12.0
* *_CheckpointRecoveryFactory_* * Stores meta information to Zookeeper/ConfigMap for checkpoint recovery. * Stores the latest checkpoint counter. Each component(Dispatcher, ResourceManager, JobManager, RestEndpoint) will have a dedicated ConfigMap. All the HA information relevant for a specific component will be stored in a single ConfigMap. The JobManager's ConfigMap would then contain the current leader, the pointers to the checkpoints and the checkpoint ID counter. Since “Get(check the leader)-and-Update(write back to the ConfigMap)” is a transactional operation, we will completely solved the concurrent modification issues and not using the "lock-and-release" in Zookeeper. -- This message was sent by Atlassian Jira (v8.3.4#803005)