[ https://issues.apache.org/jira/browse/FLINK-32643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746180#comment-17746180 ]
Hangxiang Yu commented on FLINK-32643: -------------------------------------- Hi, Thanks for the proposal. I have some questions about this, PTAL: {quote}we cannot set it too large, such as 512M, this may cause OOM {quote} We also have a large block cache size in the production env, it works well in most cases. You mean that lacking strict capacity limit for memory usage for RocksDB [1] may cause OOM ? {quote}and each DB cannot effectively utilize memory {quote} Memory sharing between RocksDB instances has been implemented [2], Could this help to resolve ? {quote}introduce off-heap shared state cache across multiple db instances for stateful operators in TM. h4. {quote} What's the cache type ? read-write cache or only read cache ? And What's the data structure ? What's the cache strategy and granularity ? Caching in the record Level my increase the overhead per record. Could this also increase the space overhead compared to Block Cache (due to compression)? Maybe I missed something about details. I'm also interested in this so wanting to learn more. [1] https://issues.apache.org/jira/browse/FLINK-15532 [2] https://issues.apache.org/jira/browse/FLINK-29928 > Introduce off-heap shared state cache across stateful operators in TM > --------------------------------------------------------------------- > > Key: FLINK-32643 > URL: https://issues.apache.org/jira/browse/FLINK-32643 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends > Affects Versions: 1.19.0 > Reporter: Fang Yong > Priority: Major > > Currently each stateful operator will create an independent db instance if it > uses rocksdb as state backend, and we can configure > `state.backend.rocksdb.block.cache-size` for each db to speed up state > performance. This parameter defaults to 8M, and we cannot set it too large, > such as 512M, this may cause OOM and each DB cannot effectively utilize > memory. To address this issue, we would like to introduce off-heap shared > state cache across multiple db instances for stateful operators in TM. -- This message was sent by Atlassian Jira (v8.20.10#820010)