[ https://issues.apache.org/jira/browse/FLINK-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Flink Jira Bot updated FLINK-6485: ---------------------------------- Labels: stale-major (was: ) > Use buffering to avoid frequent memtable flushes for short intervals in > RockdDB incremental checkpoints > ------------------------------------------------------------------------------------------------------- > > Key: FLINK-6485 > URL: https://issues.apache.org/jira/browse/FLINK-6485 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing > Reporter: Stefan Richter > Priority: Major > Labels: stale-major > > The current implementation of incremental checkpoitns in RocksDB needs to > flush the memtable to disk prior to a checkpoint and this will generate a SST > file. > What is required for fast checkpoint intervals is an alternative mechanism to > quickly determine a delta from the previous incremental checkpoint to avoid > this frequent flushing. This could be implemented through custom buffering > inside the backend, e.g. a changelog buffer that is maintain up to a certain > size. > The buffer's content becomes part of the private state in the incremental > snapshot and the buffer is dropped i) after each checkpoint or ii) after > exceeding a certain size that justifies flushing and writing a new SST file. > This mechanism should not be blocking, which we can achieve in the following > way: > 1) We have a clear upper limit to the buffer size (e.g. 64MB), once the limit > of diffs is reached, we can drop the buffer because we can assume enough work > was done to justify a new SST file > 2) We write the buffer to a local FS, so we can expect this to be reasonable > fast and that it will not suffer from the kind of blocking that we have in > DFS. I mean technically, also flushing the SST file can block. Then, in the > async part, we can transfer the locally written buffer file to DFS. > There might be other mechanisms in RocksDB that we could exploit for this, > such as the write ahead log, but this could be already be a good solution. -- This message was sent by Atlassian Jira (v8.3.4#803005)