[jira] [Commented] (FLINK-18712) Flink RocksDB statebackend memory leak issue

Farnight (Jira) Tue, 28 Jul 2020 03:30:10 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166322#comment-17166322
 ]


Farnight commented on FLINK-18712:
----------------------------------

[~yunta], below is some testing information based on simple job. Please help 
check. thanks a lot!

 

Flink configs:

for flink cluster, we use session-cluster mode.

version: 1.10

 

TM configs:

state.backend.rocksdb.memory.managed set to `true`

our k8s pod has 31G memory.

managed memory set to 10G. 

heap size set to 15G

other settings keep the default.

 

Job:
 # write a dummy source function to emit events in a for/while loop
 # use the default SessionWindow with gap 30 minutes.
 # run the job few times
 # monitor the k8s pod memory working set usage by cadvisor

 

 

case 1:

when running job on k8s (jm/tm inside a pod container). the memory working set 
keep raising, although the job is stopped, but working set doesn't decrease. 
eventually the tm process will be killed by oom-killer. and tm process will be 
restart(pid changed). then the memory working set got reset.

 

case 2:

when running job in my local machine(macbook pro) without k8s env. it doesn't 
have this issue.

> Flink RocksDB statebackend memory leak issue 
> ---------------------------------------------
>
>                 Key: FLINK-18712
>                 URL: https://issues.apache.org/jira/browse/FLINK-18712
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / State Backends
>    Affects Versions: 1.10.0
>            Reporter: Farnight
>            Priority: Critical
>
> When using RocksDB as our statebackend, we found it will lead to memory leak 
> when restarting job (manually or in recovery case).
>  
> How to reproduce:
>  # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and 
> reproduce.
>  # start a job using RocksDB statebackend.
>  # when the RocksDB blockcache reachs maximum size, restart the job. and 
> monitor the memory usage (k8s pod working set) of the TM.
>  # go through step 2-3 few more times. and memory will keep raising.
>  
> Any solution or suggestion for this? Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-18712) Flink RocksDB statebackend memory leak issue

Reply via email to