Yang Wang created FLINK-19544:
---------------------------------

             Summary: Implement CheckpointRecoveryFactory based on Kubernetes 
API
                 Key: FLINK-19544
                 URL: https://issues.apache.org/jira/browse/FLINK-19544
             Project: Flink
          Issue Type: Sub-task
          Components: Deployment / Kubernetes, Runtime / Checkpointing
            Reporter: Yang Wang
             Fix For: 1.12.0


* *_CheckpointRecoveryFactory_*
 * Stores meta information to Zookeeper/ConfigMap for checkpoint recovery.
 * Stores the latest checkpoint counter.

Each component(Dispatcher, ResourceManager, JobManager, RestEndpoint) will have 
a dedicated ConfigMap. All the HA information relevant for a specific component 
will be stored in a single ConfigMap. The JobManager's ConfigMap would then 
contain the current leader, the pointers to the checkpoints and the checkpoint 
ID counter. Since “Get(check the leader)-and-Update(write back to the 
ConfigMap)” is a transactional operation, we will completely solved the 
concurrent modification issues and not using the "lock-and-release" in 
Zookeeper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to