Eron Wright  created FLINK-8533:
-----------------------------------

             Summary: Support MasterTriggerRestoreHook state reinitialization
                 Key: FLINK-8533
                 URL: https://issues.apache.org/jira/browse/FLINK-8533
             Project: Flink
          Issue Type: Bug
          Components: State Backends, Checkpointing
    Affects Versions: 1.3.0
            Reporter: Eron Wright 
            Assignee: Eron Wright 


{{MasterTriggerRestoreHook}} enables coordination with an external system for 
taking or restoring checkpoints. When execution is restarted from a checkpoint, 
{{restoreCheckpoint}} is called to restore or reinitialize the external system 
state. There's an edge case where the external state is not adequately 
reinitialized, that is when execution fails _before the first checkpoint_. In 
that case, the hook is not invoked and has no opportunity to restore the 
external state to initial conditions.

The impact is a loss of exactly-once semantics in this case. For example, in 
the Pravega source function, the reader group state (e.g. stream position data) 
is stored externally. In the normal restore case, the reader group state is 
forcibly rewound to the checkpointed position. In the edge case where no 
checkpoint has yet been successful, the reader group state is not rewound and 
consequently some amount of stream data is not reprocessed.

A possible fix would be to introduce an {{initializeState}} method on the hook 
interface. Similar to {{CheckpointedFunction::initializeState}}, this method 
would be invoked unconditionally upon hook initialization. The Pravega hook 
would, for example, initialize or forcibly reinitialize the reader group state. 
   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to