Stefan Richter created FLINK-6533:
-------------------------------------
Summary: Duplicated registration of new shared state when
checkpoint confirmations are still pending
Key: FLINK-6533
URL: https://issues.apache.org/jira/browse/FLINK-6533
Project: Flink
Issue Type: Bug
Components: State Backends, Checkpointing
Affects Versions: 1.3.0
Reporter: Stefan Richter
Assignee: Stefan Richter
Priority: Blocker
Each incremental RocksDB checkpoint n is registering new and existing shared
state with the {{SharedStateRegistry}} when it completes. Only then, the
backend is notified and all following checkpoints (n+x) can reference the new
state in checkpoint n.
However, when a checkpoint n+1 is already starting before n was confirmed to
the backend, n+1 can assume some files as new, which were already contained in
n. It will upload the file to DFS again, creating a new state handle.
Then, once n+1 completes, it could to register some state as new, which was
previously registered already by n, without n+1 knowing of this. Currently this
violates a precondition check, that the reference count for state that is
assumed as new is 1.
While we cannot prevent duplicate uploads, we must resolve this situation in
the {{SharedStateREgistry}}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)