[ https://issues.apache.org/jira/browse/FLINK-31089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689454#comment-17689454 ]
xiaogang zhou commented on FLINK-31089: --------------------------------------- [~yunta] looks like we are already using the jemalloc $ /usr/bin/pmap -x 1 | grep malloc 00007f434e9aa000 204 204 0 r-x-- libjemalloc.so.1 00007f434e9aa000 0 0 0 r-x-- libjemalloc.so.1 00007f434e9dd000 2044 0 0 ----- libjemalloc.so.1 00007f434e9dd000 0 0 0 ----- libjemalloc.so.1 00007f434ebdc000 8 8 8 r---- libjemalloc.so.1 00007f434ebdc000 0 0 0 r---- libjemalloc.so.1 00007f434ebde000 4 4 4 rw--- libjemalloc.so.1 00007f434ebde000 0 0 0 rw--- libjemalloc.so.1 and 'state.backend.rocksdb.memory.partitioned-index-filters' yes, we configured it as true. without the two_level_index_cache. the rocksdb performance is really low. And flink_taskmanager_job_task_operator_.*rocksdb_block_cache_pinned_usage can growing quickly if left PinL0FilterAndIndexBlocksInCache true > pin L0 index in memory can lead to slow memory grow finally lead to memory > beyond limit > --------------------------------------------------------------------------------------- > > Key: FLINK-31089 > URL: https://issues.apache.org/jira/browse/FLINK-31089 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends > Affects Versions: 1.16.1 > Reporter: xiaogang zhou > Priority: Major > Attachments: image-2023-02-15-20-26-58-604.png, > image-2023-02-15-20-32-17-993.png > > > with the setPinL0FilterAndIndexBlocksInCache true, we can see the pinned > memory kept growing(in the pc blow from 48G-> 50G in about 5 hours). But if > we switch it to false, we can see the pinned memory stay realtive static. In > our environment, a lot of tasks restart due to memory over limit killed by k8s > !image-2023-02-15-20-26-58-604.png|width=899,height=447! > > !image-2023-02-15-20-32-17-993.png|width=853,height=464! > the two graphs are recorded in yesterday and today, which means the data > stream number per second will not differ alot. -- This message was sent by Atlassian Jira (v8.20.10#820010)