Stefan Richter created FLINK-5715: ------------------------------------- Summary: Asynchronous snapshotting for HeapKeyedStateBackend Key: FLINK-5715 URL: https://issues.apache.org/jira/browse/FLINK-5715 Project: Flink Issue Type: New Feature Components: State Backends, Checkpointing Affects Versions: 1.3.0 Reporter: Stefan Richter Assignee: Stefan Richter
Blocking snapshots render the HeapKeyedStateBackend practically unusable for many user in productions. Their jobs can not tolerate stopped processing for the time it takes to write gigabytes of data from memory to disk. Asynchronous snapshots would be a solution to this problem. The challenge for the implementation is coming up with a copy-on-write scheme for the in-memory hash maps that build the foundation of this backend. After taking a closer look, this problem is twofold. First, providing CoW semantics for the hashmap itself, as a mutible structure, thereby avoiding costly locking or blocking where possible. Second, CoW for the mutable value objects, e.g. through cloning via serializers. -- This message was sent by Atlassian JIRA (v6.3.15#6346)