[ 
https://issues.apache.org/jira/browse/FLINK-32643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746180#comment-17746180
 ] 

Hangxiang Yu commented on FLINK-32643:
--------------------------------------

Hi, Thanks for the proposal.
I have some questions about this, PTAL:
{quote}we cannot set it too large, such as 512M, this may cause OOM
{quote}
We also have a large block cache size in the production env, it works well in 
most cases.

You mean that lacking strict capacity limit for memory usage for RocksDB [1] 
may cause OOM ?
{quote}and each DB cannot effectively utilize memory
{quote}
Memory sharing between RocksDB instances has been implemented [2], Could this 
help to resolve ?
{quote}introduce off-heap shared state cache across multiple db instances for 
stateful operators in TM.
h4. 
{quote}
What's the cache type ? read-write cache or only read cache ? And What's the 
data structure ?

What's the cache strategy and granularity ? Caching in the record Level  my 
increase the overhead per record.

Could this also increase the space overhead compared to Block Cache (due to 
compression)?

Maybe I missed something about details. I'm also interested in this so wanting 
to learn more. 


[1] https://issues.apache.org/jira/browse/FLINK-15532
[2] https://issues.apache.org/jira/browse/FLINK-29928

> Introduce off-heap shared state cache across stateful operators in TM
> ---------------------------------------------------------------------
>
>                 Key: FLINK-32643
>                 URL: https://issues.apache.org/jira/browse/FLINK-32643
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>    Affects Versions: 1.19.0
>            Reporter: Fang Yong
>            Priority: Major
>
> Currently each stateful operator will create an independent db instance if it 
> uses rocksdb as state backend, and we can configure 
> `state.backend.rocksdb.block.cache-size` for each db to speed up state 
> performance. This parameter defaults to 8M, and we cannot set it too large, 
> such as 512M, this may cause OOM and each DB cannot effectively utilize 
> memory. To address this issue, we would like to introduce off-heap shared 
> state cache across multiple db instances for stateful operators in TM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to