[ 
https://issues.apache.org/jira/browse/FLINK-14344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949379#comment-16949379
 ] 

Piotr Nowojski commented on FLINK-14344:
----------------------------------------

{{MasterTriggerRestoreHook}} is not marked {{PublicEvolving}}, but it clearly 
should be since {{WithMasterCheckpointHook}} is marked like that and it clearly 
depends on the former.

The good news is that it is {{PublicEvolving}}, so I think we can change it. 
For example provide two new methods (one synchronous another asynchronous), but 
both named differently than the current one, so that user gets instant compile 
error that something has changed.

However I don't fully understand the problem yet. I'm not sure why should we 
execute the hooks in the IO Executor, why not master thread? Especially that 
synchronous hooks are supposed to be fired before the checkpoint is triggered, 
so those can not be long lasting operations. Long lasting operations can be 
executed asynchronously in some user controlled thread.

However 2, I don't understand the current code for executing the hooks. 
{{MasterHooks#triggerMasterHooks()}} is called by the {{CheckpointCoordinator}} 
under the global lock one by one and inside {{MasterHooks#triggerHook}} we are 
always synchronously waiting with a timeout for the {{CompletableFuture<T> 
resultFuture;}} to be done. It looks like there is literally no difference 
between synchronous and asynchronous hooks...

> Snapshot master hook state asynchronously
> -----------------------------------------
>
>                 Key: FLINK-14344
>                 URL: https://issues.apache.org/jira/browse/FLINK-14344
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Checkpointing
>            Reporter: Biao Liu
>            Assignee: Biao Liu
>            Priority: Major
>             Fix For: 1.10.0
>
>
> Currently we snapshot the master hook state synchronously. As a part of 
> reworking threading model of {{CheckpointCoordinator}}, we have to make this 
> non-blocking to satisfy the requirement of running in main thread.
> The behavior of snapshotting master hook state is similar to task state 
> snapshotting. Master state snapshotting is taken before task state 
> snapshotting. Because in master hook, there might be external system 
> initialization which task state snapshotting might depend on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to