[ https://issues.apache.org/jira/browse/FLINK-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019663#comment-16019663 ]
Till Rohrmann commented on FLINK-6328: -------------------------------------- Given that the lifecycle of a savepoint is out of control of the {{CheckpointCoordinator}}, I think it is best to not add savepoints to the {{CompletedCheckpointStore}} and, thus, not considering them for job recovery. The reason for this is FLINK-4815, because otherwise a single broken/deleted savepoint will thwart Flink's whole recovery mechanism. Once FLINK-4815 has been added we might think again about re-adding savepoints to the {{CompletedCheckpointStore}} and, thus, allowing to recover from savepoints in case of failures. When doing so, we should, however, not count the savepoints for the number of retained checkpoints, because we cannot be sure that they still exist. > Savepoints must not be counted as retained checkpoints > ------------------------------------------------------ > > Key: FLINK-6328 > URL: https://issues.apache.org/jira/browse/FLINK-6328 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing > Affects Versions: 1.2.0, 1.3.0, 1.4.0 > Reporter: Stephan Ewen > Assignee: Till Rohrmann > Priority: Blocker > Fix For: 1.3.0, 1.2.2 > > > The Checkpoint Store retains the *n* latest checkpoints. > Savepoints are counted as well, meaning that for settings with 1 retained > checkpoint, there are sometimes no retained checkpoints at all, only a > savepoint. > That is dangerous, because savepoints must be assumed to disappear at any > point in time - their lifecycle is out of control of the > CheckpointCoordinator. -- This message was sent by Atlassian JIRA (v6.3.15#6346)