[ https://issues.apache.org/jira/browse/FLINK-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997994#comment-15997994 ]
ASF GitHub Bot commented on FLINK-6364: --------------------------------------- Github user StefanRRichter commented on the issue: https://github.com/apache/flink/pull/3801 I am sorry, but before merging I noticed that some tests (e.g. `RocksDBStateBackendTest.testCancelRunningSnapshot`) fail sporadically (only on Travis). I tracked the problem and I think the cause is a lack of eagerly closing the streams in `cancel()` to interrupt blocking IO calls. I suggest the following fix: `RocksDBIncrementalSnapshotOperation` should have it’s own `CloseableRegistry`. This tracks all the open streams inside the checkpointing and is registered with the backends registry for as long as the task runs. Then, in cancel, as a first step we can close and unregister that inner `CloseableRegistry`. This also prevents races that the current stream gets closed asynchronously by `cancel()`, which the checkpointing actually already opened the next stream (the registry closes and blocks new streams on registration once it is closed) > Implement incremental checkpointing in RocksDBStateBackend > ---------------------------------------------------------- > > Key: FLINK-6364 > URL: https://issues.apache.org/jira/browse/FLINK-6364 > Project: Flink > Issue Type: Sub-task > Components: State Backends, Checkpointing > Reporter: Xiaogang Shi > Assignee: Xiaogang Shi > > {{RocksDBStateBackend}} is well suited for incremental checkpointing because > RocksDB is base on LSM trees, which record updates in new sst files and all > sst files are immutable. By only materializing those new sst files, we can > significantly improve the performance of checkpointing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)