[ 
https://issues.apache.org/jira/browse/FLINK-12144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813331#comment-16813331
 ] 

vinoyang commented on FLINK-12144:
----------------------------------

[~gyfora] You are welcome. It seems that FLINK-11159 make sense. cc 
[~till.rohrmann] [~Zentol] What's your opinion?

> Option to prefer checkpoints on recovery
> ----------------------------------------
>
>                 Key: FLINK-12144
>                 URL: https://issues.apache.org/jira/browse/FLINK-12144
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Checkpointing
>            Reporter: Gyula Fora
>            Assignee: vinoyang
>            Priority: Trivial
>
> When a streaming job fails the getLatestCheckpoint() of the CheckpointStore 
> is used to determine which checkpoint or savepoint is going to be used for 
> recovery.
> This behaviour is perfectly fine for jobs with relatively small states or 
> where there are no strong SLAs but it some cases it can be problematic.
> For jobs with a very large state size, the difference between recovery times 
> from savepoints and checkpoints can be substantial to the point where it 
> might break a use-case. So we would like to avoid ever recovering from a 
> savepoint if a not too old checkpoint is also readily available.
> This cannot be avoided right now if a job fails after we took a savepoint 
> maybe for backup purposes (maybe it is scheduled multiple times a day).
> I suggest we add a configuration option to allow the job to fall back to an 
> earlier checkpoint (within maybe a certain age limit) even if there is a 
> newer savepoint available to avoid lengthy downtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to