[jira] [Commented] (FLINK-3397) Failed streaming jobs should fall back to the most recent checkpoint/savepoint

ramkrishna.s.vasudevan (JIRA) Fri, 24 Jun 2016 06:46:22 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348283#comment-15348283
 ]


ramkrishna.s.vasudevan commented on FLINK-3397:
-----------------------------------------------

I checked this JIRA and the one prior to this 
https://issues.apache.org/jira/browse/FLINK-3390.
The impl is such that only if the checkpoint fails to retrieve from the 
Checkpoint coordinator then get it from the SavePoint coordinator if there is a 
save point restore path.
So this JIRA expects to check and compare the latest save point and latest 
checkpoint. If the save point is more latest than the checkpoint then only 
retrieve from the save point rather than even checking the checkpoint 
coordinator. Is that right?

> Failed streaming jobs should fall back to the most recent checkpoint/savepoint
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-3397
>                 URL: https://issues.apache.org/jira/browse/FLINK-3397
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.0.0
>            Reporter: Gyula Fora
>            Priority: Minor
>
> The current fallback behaviour in case of a streaming job failure is slightly 
> counterintuitive:
> If a job fails it will fall back to the most recent checkpoint (if any) even 
> if there were more recent savepoint taken. This means that savepoints are not 
> regarded as checkpoints by the system only points from where a job can be 
> manually restarted.
> I suggest to change this so that savepoints are also regarded as checkpoints 
> in case of a failure and they will also be used to automatically restore the 
> streaming job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3397) Failed streaming jobs should fall back to the most recent checkpoint/savepoint

Reply via email to