[jira] [Created] (FLINK-6533) Duplicated registration of new shared state when checkpoint confirmations are still pending

Stefan Richter (JIRA) Thu, 11 May 2017 01:40:54 -0700

Stefan Richter created FLINK-6533:
-------------------------------------

             Summary: Duplicated registration of new shared state when 
checkpoint confirmations are still pending
                 Key: FLINK-6533
                 URL: https://issues.apache.org/jira/browse/FLINK-6533
             Project: Flink
          Issue Type: Bug
          Components: State Backends, Checkpointing
    Affects Versions: 1.3.0
            Reporter: Stefan Richter
            Assignee: Stefan Richter
            Priority: Blocker



Each incremental RocksDB checkpoint n is registering new and existing shared 
state with the {{SharedStateRegistry}} when it completes. Only then, the 
backend is notified and all following checkpoints (n+x) can reference the new 
state in checkpoint n.

However, when a checkpoint n+1 is already starting before n was confirmed to 
the backend, n+1 can assume some files as new, which were already contained in 
n. It will upload the file to DFS again, creating a new state handle.

Then, once n+1 completes, it could to register some state as new, which was 
previously registered already by n, without n+1 knowing of this. Currently this 
violates a precondition check, that the reference count for state that is 
assumed as new is 1.

While we cannot prevent duplicate uploads, we must resolve this situation in 
the {{SharedStateREgistry}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (FLINK-6533) Duplicated registration of new shared state when checkpoint confirmations are still pending

Reply via email to