[ https://issues.apache.org/jira/browse/FLINK-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380873#comment-15380873 ]
ramkrishna.s.vasudevan commented on FLINK-3397: ----------------------------------------------- Thanks for the valuable feedback. >>This is also configurable, you can keep around multiple completed checkpoints. Ok. Get that. Will update the doc. bq. I would remove the last part "and create a new one" as this is independent of when savepoints are cleared. The important thing is that they are not automatically cleared. Ok bq.This is not checked automatically, but the user provides the savepoint path to resume from. Yup. That what I intended here. bq. if a job was submitted with a savepoint path to recover from, it will always fall back to that state in the worst case. What does not happen is that it is falling back to any newer savepoints (even if some were triggered). I think worst case in the sense if the checkpoint restoration failed, then the savepoint path will be used. But by the time if there was any new save point only that will be used right? The problem is that since we try to restore from checkpoint first we tend to miss out any newer save points. bq.They are currently mostly independent of the job from which they were created. Can you elaborate more on this? You suggest that there should be a mapping between the savepoint and the job? > Failed streaming jobs should fall back to the most recent checkpoint/savepoint > ------------------------------------------------------------------------------ > > Key: FLINK-3397 > URL: https://issues.apache.org/jira/browse/FLINK-3397 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing, Streaming > Affects Versions: 1.0.0 > Reporter: Gyula Fora > Priority: Minor > Attachments: FLINK-3397.pdf > > > The current fallback behaviour in case of a streaming job failure is slightly > counterintuitive: > If a job fails it will fall back to the most recent checkpoint (if any) even > if there were more recent savepoint taken. This means that savepoints are not > regarded as checkpoints by the system only points from where a job can be > manually restarted. > I suggest to change this so that savepoints are also regarded as checkpoints > in case of a failure and they will also be used to automatically restore the > streaming job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)