[ https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906764#comment-16906764 ]
TisonKun commented on FLINK-10333: ---------------------------------- Hi [~till.rohrmann], I have updated the design document(attached on the issue also) and add details on how LeaderStore works and how it integrates with Dispatcher on job handling. Basically with LeaderStore we provide atomic write operation which also check leadership. Based on this encapsulation we could smoothly revisit how we persist state and how we organize pointers to this state in ZooKeeper. The document also revisits how we do leader election and store leader information that we ensure there is only one leader. On this topic, it is helpful if we involves FLINK-11843 because a cleaner component lifecycle makes reasoning fault scenarios easily. Briefly, a "new -> start -> grantLeader -> revokeLeader -> stop" lifecycle is in my mind. When investigating job status handling, I found FLINK-11813 involved, it is not only about how we persist state, but how we coordinate with external world such as client submission and job manager start/stop. It'll be nice if we re-active this thread to make effort reach a consensus or even resolve it in the coming develop cycle. > Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, > CompletedCheckpoints) > ------------------------------------------------------------------------------------- > > Key: FLINK-10333 > URL: https://issues.apache.org/jira/browse/FLINK-10333 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.5.3, 1.6.0, 1.7.0 > Reporter: Till Rohrmann > Priority: Major > > While going over the ZooKeeper based stores > ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, > {{ZooKeeperCompletedCheckpointStore}}) and the underlying > {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were > introduced with past incremental changes. > * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} > or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization > problems will either lead to removing the Znode or not > * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of > exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case > of a failure) > * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be > better to move {{RetrievableStateStorageHelper}} out of it for a better > separation of concerns > * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even > if it is locked. This should not happen since it could leave another system > in an inconsistent state (imagine a changed {{JobGraph}} which restores from > an old checkpoint) > * Redundant but also somewhat inconsistent put logic in the different stores > * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} > which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}} > * Getting rid of the {{SubmittedJobGraphListener}} would be helpful > These problems made me think how reliable these components actually work. > Since these components are very important, I propose to refactor them. -- This message was sent by Atlassian JIRA (v7.6.14#76016)