[jira] [Commented] (FLINK-3397) Failed streaming jobs should fall back to the most recent checkpoint/savepoint

ramkrishna.s.vasudevan (JIRA) Sat, 16 Jul 2016 11:08:36 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380873#comment-15380873
 ]


ramkrishna.s.vasudevan commented on FLINK-3397:
-----------------------------------------------

Thanks for the valuable feedback.
>>This is also configurable, you can keep around multiple completed checkpoints.
Ok. Get that. Will update the doc.
bq. I would remove the last part "and create a new one" as this is independent 
of when savepoints are cleared. The important thing is that they are not 
automatically cleared.
Ok
bq.This is not checked automatically, but the user provides the savepoint path 
to resume from.
Yup. That what I intended here.
bq. if a job was submitted with a savepoint path to recover from, it will 
always fall back to that state in the worst case. What does not happen is that 
it is falling back to any newer savepoints (even if some were triggered).
I think worst case in the sense if the checkpoint restoration failed, then the 
savepoint path will be used. But by the time if there was any new save point 
only that will be used right?  The problem is that since we try to restore from 
checkpoint first we tend to miss out any newer save points.
bq.They are currently mostly independent of the job from which they were 
created.
Can you elaborate more on this? You suggest that there should be a mapping 
between the savepoint and the job?





> Failed streaming jobs should fall back to the most recent checkpoint/savepoint
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-3397
>                 URL: https://issues.apache.org/jira/browse/FLINK-3397
>             Project: Flink
>          Issue Type: Improvement
>          Components: State Backends, Checkpointing, Streaming
>    Affects Versions: 1.0.0
>            Reporter: Gyula Fora
>            Priority: Minor
>         Attachments: FLINK-3397.pdf
>
>
> The current fallback behaviour in case of a streaming job failure is slightly 
> counterintuitive:
> If a job fails it will fall back to the most recent checkpoint (if any) even 
> if there were more recent savepoint taken. This means that savepoints are not 
> regarded as checkpoints by the system only points from where a job can be 
> manually restarted.
> I suggest to change this so that savepoints are also regarded as checkpoints 
> in case of a failure and they will also be used to automatically restore the 
> streaming job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3397) Failed streaming jobs should fall back to the most recent checkpoint/savepoint

Reply via email to