[ https://issues.apache.org/jira/browse/FLINK-11159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725719#comment-16725719 ]
Till Rohrmann commented on FLINK-11159: --------------------------------------- Wouldn't the problem be solved if Flink would cache the savepoint locally after having resumed from it? We need to download the savepoint files anyway. > Allow configuration whether to fall back to savepoints for restore > ------------------------------------------------------------------ > > Key: FLINK-11159 > URL: https://issues.apache.org/jira/browse/FLINK-11159 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing > Affects Versions: 1.5.5, 1.6.2, 1.7.0 > Reporter: Nico Kruber > Assignee: vinoyang > Priority: Major > > Ever since FLINK-3397, upon failure, Flink would restart from the latest > checkpoint/savepoint which ever is more recent. With the introduction of > local recovery and the knowledge that a RocksDB checkpoint restore would just > copy the files, it may be time to re-consider / making this configurable: > In certain situations, it may be faster to restore from the latest checkpoint > only (even if there is a more recent savepoint) and reprocess the data > between. On the downside, though, that may not be correct because that might > break side effects if the savepoint was the latest one, e.g. consider this > chain: {{chk1 -> chk2 -> sp … restore chk2 -> …}}. Then all side effects > between {{chk2 -> sp}} would be reproduced. > Making this configurable will allow the user to set whatever he needs / can > to get the lowest recovery time in Flink. -- This message was sent by Atlassian JIRA (v7.6.3#76005)