[ https://issues.apache.org/jira/browse/FLINK-22684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347450#comment-17347450 ]
Piotr Nowojski commented on FLINK-22684: ---------------------------------------- 2. I think supporting anything more fine grained than just simple "drop all data" looks like not worth the effort. 4. Let's assume checkpoint 42 is corrupted. User discovered that when trying to recover from it. Ideally this should correctly handle the following cases: a) user want's to drop/ignore in-flight data for this checkpoint - in-flight data when restarting from checkpoint 42 are ignored b) after ignoring in-flight data in a), job triggers a couple of more checkpoints, but they do not complete before the job fail overs again because of unrelated reason. It fails over from checkpoint 42 again. In-flight data should be ignored, as they are corrupted and user's intention was to ignore them in a). c) after a) and maybe b), job finally successfully completes checkpoint 43 (or newer). If job fails over now (again for unrelated reasons), in-flight data should *NOT* be ignored when recovering from checkpoint 43 (or newer). a) and c) are crucial must have IMO. b) would be really nice to have. My idea was instead of using a simple boolean flag {{"ignore-in-flight-data: true/false"}}, was to use {{"ignore-in-flight-data-for-checkpoint-id: 42"}}. User could set this option once, and completely forget about it, and everything would be working as expected. No need to unset the flag, as with the simple boolean flag. It's simple and it should work, but that's just an idea. Maybe there is a better way how to handle it. > Add the ability to ignore in-flight data on recovery > ---------------------------------------------------- > > Key: FLINK-22684 > URL: https://issues.apache.org/jira/browse/FLINK-22684 > Project: Flink > Issue Type: Improvement > Reporter: Anton Kalashnikov > Priority: Major > > The main case: > * We want to restore the last unaligned checkpoint. > * In-flight data of this checkpoint is corrupted. > * We want to ignore this corrupted data and restore only states. > The idea is having new configuration parameter('ignoreInFlightDataOnRecovery' > or similar). and If it set to true, ignore the metadata of in-flight data on > the Checkpoint Coordinator side. -- This message was sent by Atlassian Jira (v8.3.4#803005)