[ https://issues.apache.org/jira/browse/FLINK-14344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949379#comment-16949379 ]
Piotr Nowojski commented on FLINK-14344: ---------------------------------------- {{MasterTriggerRestoreHook}} is not marked {{PublicEvolving}}, but it clearly should be since {{WithMasterCheckpointHook}} is marked like that and it clearly depends on the former. The good news is that it is {{PublicEvolving}}, so I think we can change it. For example provide two new methods (one synchronous another asynchronous), but both named differently than the current one, so that user gets instant compile error that something has changed. However I don't fully understand the problem yet. I'm not sure why should we execute the hooks in the IO Executor, why not master thread? Especially that synchronous hooks are supposed to be fired before the checkpoint is triggered, so those can not be long lasting operations. Long lasting operations can be executed asynchronously in some user controlled thread. However 2, I don't understand the current code for executing the hooks. {{MasterHooks#triggerMasterHooks()}} is called by the {{CheckpointCoordinator}} under the global lock one by one and inside {{MasterHooks#triggerHook}} we are always synchronously waiting with a timeout for the {{CompletableFuture<T> resultFuture;}} to be done. It looks like there is literally no difference between synchronous and asynchronous hooks... > Snapshot master hook state asynchronously > ----------------------------------------- > > Key: FLINK-14344 > URL: https://issues.apache.org/jira/browse/FLINK-14344 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Checkpointing > Reporter: Biao Liu > Assignee: Biao Liu > Priority: Major > Fix For: 1.10.0 > > > Currently we snapshot the master hook state synchronously. As a part of > reworking threading model of {{CheckpointCoordinator}}, we have to make this > non-blocking to satisfy the requirement of running in main thread. > The behavior of snapshotting master hook state is similar to task state > snapshotting. Master state snapshotting is taken before task state > snapshotting. Because in master hook, there might be external system > initialization which task state snapshotting might depend on. -- This message was sent by Atlassian Jira (v8.3.4#803005)