[ https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166322#comment-17166322 ]
Farnight commented on FLINK-18712: ---------------------------------- [~yunta], below is some testing information based on simple job. Please help check. thanks a lot! Flink configs: for flink cluster, we use session-cluster mode. version: 1.10 TM configs: state.backend.rocksdb.memory.managed set to `true` our k8s pod has 31G memory. managed memory set to 10G. heap size set to 15G other settings keep the default. Job: # write a dummy source function to emit events in a for/while loop # use the default SessionWindow with gap 30 minutes. # run the job few times # monitor the k8s pod memory working set usage by cadvisor case 1: when running job on k8s (jm/tm inside a pod container). the memory working set keep raising, although the job is stopped, but working set doesn't decrease. eventually the tm process will be killed by oom-killer. and tm process will be restart(pid changed). then the memory working set got reset. case 2: when running job in my local machine(macbook pro) without k8s env. it doesn't have this issue. > Flink RocksDB statebackend memory leak issue > --------------------------------------------- > > Key: FLINK-18712 > URL: https://issues.apache.org/jira/browse/FLINK-18712 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends > Affects Versions: 1.10.0 > Reporter: Farnight > Priority: Critical > > When using RocksDB as our statebackend, we found it will lead to memory leak > when restarting job (manually or in recovery case). > > How to reproduce: > # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and > reproduce. > # start a job using RocksDB statebackend. > # when the RocksDB blockcache reachs maximum size, restart the job. and > monitor the memory usage (k8s pod working set) of the TM. > # go through step 2-3 few more times. and memory will keep raising. > > Any solution or suggestion for this? Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)