[ https://issues.apache.org/jira/browse/FLINK-8533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356449#comment-16356449 ]
ASF GitHub Bot commented on FLINK-8533: --------------------------------------- GitHub user EronWright opened a pull request: https://github.com/apache/flink/pull/5427 [FLINK-8533] [checkpointing] Support MasterTriggerRestoreHook state reinitialization Signed-off-by: Eron Wright <eronwri...@gmail.com> ## What is the purpose of the change Support MasterTriggerRestoreHook state re-initialization, to eliminate an edge case involving execution restarts where no checkpoint state is available. ## Brief change log - extend `MasterTriggerRestoreHook` with `initializeState` method. - invoke `initializeState` upon initial execution and upon global restart. ## Verifying this change - Revised test: `CheckpointCoordinatorMasterHooksTest` ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **yes** - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: **yes** - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no You can merge this pull request into a Git repository by running: $ git pull https://github.com/EronWright/flink FLINK-8533-hook-initialization Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5427.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5427 ---- commit 9ad0e03c1aa81012ae14f598acdcf3eb76c9ec9f Author: Eron Wright <eronwright@...> Date: 2018-02-08T03:38:47Z [FLINK-8533] Support MasterTriggerRestoreHook state reinitialization - extend `MasterTriggerRestoreHook` with `initializeState` method. - invoke `initializeState` upon initial execution and upon global restart. Signed-off-by: Eron Wright <eronwri...@gmail.com> ---- > Support MasterTriggerRestoreHook state reinitialization > ------------------------------------------------------- > > Key: FLINK-8533 > URL: https://issues.apache.org/jira/browse/FLINK-8533 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing > Affects Versions: 1.3.0 > Reporter: Eron Wright > Assignee: Eron Wright > Priority: Major > > {{MasterTriggerRestoreHook}} enables coordination with an external system for > taking or restoring checkpoints. When execution is restarted from a > checkpoint, {{restoreCheckpoint}} is called to restore or reinitialize the > external system state. There's an edge case where the external state is not > adequately reinitialized, that is when execution fails _before the first > checkpoint_. In that case, the hook is not invoked and has no opportunity to > restore the external state to initial conditions. > The impact is a loss of exactly-once semantics in this case. For example, in > the Pravega source function, the reader group state (e.g. stream position > data) is stored externally. In the normal restore case, the reader group > state is forcibly rewound to the checkpointed position. In the edge case > where no checkpoint has yet been successful, the reader group state is not > rewound and consequently some amount of stream data is not reprocessed. > A possible fix would be to introduce an {{initializeState}} method on the > hook interface. Similar to {{CheckpointedFunction::initializeState}}, this > method would be invoked unconditionally upon hook initialization. The Pravega > hook would, for example, initialize or forcibly reinitialize the reader group > state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)