[ 
https://issues.apache.org/jira/browse/FLINK-22684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347450#comment-17347450
 ] 

Piotr Nowojski commented on FLINK-22684:
----------------------------------------

2. I think supporting anything more fine grained than just simple "drop all 
data" looks like not worth the effort.
4. Let's assume checkpoint 42 is corrupted. User discovered that when trying to 
recover from it. Ideally this should correctly handle the following cases:
a) user want's to drop/ignore in-flight data for this checkpoint - in-flight 
data when restarting from checkpoint 42 are ignored
b) after ignoring in-flight data in a), job triggers a couple of more 
checkpoints, but they do not complete before the job fail overs again because 
of unrelated reason. It fails over from checkpoint 42 again. In-flight data 
should be ignored, as they are corrupted and user's intention was to ignore 
them in a).
c) after a) and maybe b), job finally successfully completes checkpoint 43 (or 
newer). If job fails over now (again for unrelated reasons), in-flight data 
should *NOT* be ignored when recovering from checkpoint 43 (or newer).


a) and c) are crucial must have IMO. b) would be really nice to have. 

My idea was instead of using a simple boolean flag {{"ignore-in-flight-data: 
true/false"}}, was to use {{"ignore-in-flight-data-for-checkpoint-id: 42"}}. 
User could set this option once, and completely forget about it, and everything 
would be working as expected. No need to unset the flag, as with the simple 
boolean flag. It's simple and it should work, but that's just an idea. Maybe 
there is a better way how to handle it.

> Add the ability to ignore in-flight data on recovery
> ----------------------------------------------------
>
>                 Key: FLINK-22684
>                 URL: https://issues.apache.org/jira/browse/FLINK-22684
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Anton Kalashnikov
>            Priority: Major
>
> The main case:
>  * We want to restore the last unaligned checkpoint.
>  * In-flight data of this checkpoint is corrupted.
>  * We want to ignore this corrupted data and restore only states.
> The idea is having new configuration parameter('ignoreInFlightDataOnRecovery' 
> or similar). and If it set to true, ignore the metadata of in-flight data on 
> the Checkpoint Coordinator side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to