Till Rohrmann created FLINK-10855:
-------------------------------------

             Summary: CheckpointCoordinator does not delete checkpoint 
directory of late/failed checkpoints
                 Key: FLINK-10855
                 URL: https://issues.apache.org/jira/browse/FLINK-10855
             Project: Flink
          Issue Type: Bug
          Components: State Backends, Checkpointing
    Affects Versions: 1.6.2, 1.5.5, 1.7.0
            Reporter: Till Rohrmann


In case that an acknowledge checkpoint message is late or a checkpoint cannot 
be acknowledged, we discard the subtask state in the {{CheckpointCoordinator}}. 
What's not happening in this case is that we delete the parent directory of the 
checkpoint. This only happens when we dispose a {{PendingCheckpoint#dispose}}. 

Due to this behaviour it can happen that a checkpoint fails (e.g. a task not 
being ready) and we delete the checkpoint directory. Next another task writes 
its checkpoint data to the checkpoint directory (thereby creating it again) and 
sending an acknowledge message back to the {{CheckpointCoordinator}}. The 
{{CheckpointCoordinator}} will realize that there is no longer a 
{{PendingCheckpoint}} and will discard the sub task state. This will remove the 
state files from the checkpoint directory but will leave the checkpoint 
directory untouched.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to