[ https://issues.apache.org/jira/browse/FLINK-22494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias reassigned FLINK-22494: -------------------------------- Assignee: Matthias > Avoid discarding checkpoints in case of failure > ----------------------------------------------- > > Key: FLINK-22494 > URL: https://issues.apache.org/jira/browse/FLINK-22494 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / Coordination > Affects Versions: 1.13.0, 1.14.0, 1.12.3 > Reporter: Matthias > Assignee: Matthias > Priority: Critical > Fix For: 1.14.0, 1.13.1, 1.12.4 > > > Both {{StateHandleStore}} implementations (i.e. > [KubernetesStateHandleStore:157|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/highavailability/KubernetesStateHandleStore.java#L157] > and > [ZooKeeperStateHandleStore:170|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/zookeeper/ZooKeeperStateHandleStore.java#L170]) > discard checkpoints if the checkpoint metadata wasn't written to the > backend. > This does not cover the cases where the data was actually written to the > backend but the call failed anyway (e.g. due to network issues). In such a > case, we might end up having a pointer in the backend pointing to a > checkpoint that was discarded. > Instead of discarding the checkpoint data in this case, we might want to keep > it for this specific use case. Otherwise, we might run into Exceptions when > recovering from the Checkpoint later on. We might want to add a warning to > the user pointing to the possibly orphaned checkpoint data. -- This message was sent by Atlassian Jira (v8.3.4#803005)