Github user StefanRRichter commented on the issue: https://github.com/apache/flink/pull/3801 I am sorry, but before merging I noticed that some tests (e.g. `RocksDBStateBackendTest.testCancelRunningSnapshot`) fail sporadically (only on Travis). I tracked the problem and I think the cause is a lack of eagerly closing the streams in `cancel()` to interrupt blocking IO calls. I suggest the following fix: `RocksDBIncrementalSnapshotOperation` should have itâs own `CloseableRegistry`. This tracks all the open streams inside the checkpointing and is registered with the backends registry for as long as the task runs. Then, in cancel, as a first step we can close and unregister that inner `CloseableRegistry`. This also prevents races that the current stream gets closed asynchronously by `cancel()`, which the checkpointing actually already opened the next stream (the registry closes and blocks new streams on registration once it is closed)
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---