[ 
https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906764#comment-16906764
 ] 

TisonKun commented on FLINK-10333:
----------------------------------

Hi [~till.rohrmann], I have updated the design document(attached on the issue 
also) and add details on how LeaderStore works and how it integrates with 
Dispatcher on job handling.

Basically with LeaderStore we provide atomic write operation which also check 
leadership. Based on this encapsulation we could smoothly revisit how we 
persist state and how we organize pointers to this state in ZooKeeper.

The document also revisits how we do leader election and store leader 
information that we ensure there is only one leader. On this topic, it is 
helpful if we involves FLINK-11843 because a cleaner component lifecycle makes 
reasoning fault scenarios easily. Briefly, a "new -> start -> grantLeader -> 
revokeLeader -> stop" lifecycle is in my mind.

When investigating job status handling, I found FLINK-11813 involved, it is not 
only about how we persist state, but how we coordinate with external world such 
as client submission and job manager start/stop.

It'll be nice if we re-active this thread to make effort reach a consensus or 
even resolve it in the coming develop cycle.

> Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, 
> CompletedCheckpoints)
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-10333
>                 URL: https://issues.apache.org/jira/browse/FLINK-10333
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.5.3, 1.6.0, 1.7.0
>            Reporter: Till Rohrmann
>            Priority: Major
>
> While going over the ZooKeeper based stores 
> ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, 
> {{ZooKeeperCompletedCheckpointStore}}) and the underlying 
> {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were 
> introduced with past incremental changes.
> * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} 
> or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization 
> problems will either lead to removing the Znode or not
> * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of 
> exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case 
> of a failure)
> * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be 
> better to move {{RetrievableStateStorageHelper}} out of it for a better 
> separation of concerns
> * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even 
> if it is locked. This should not happen since it could leave another system 
> in an inconsistent state (imagine a changed {{JobGraph}} which restores from 
> an old checkpoint)
> * Redundant but also somewhat inconsistent put logic in the different stores
> * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} 
> which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}}
> * Getting rid of the {{SubmittedJobGraphListener}} would be helpful
> These problems made me think how reliable these components actually work. 
> Since these components are very important, I propose to refactor them.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to