I have met similar issue. Yarn kills the TaskManagers, as their memory usage grows to the limit. I think it might be rocksdb causing the problem. Is there any way to debug the memory usage of rocksdb backend?
Best Yan ________________________________ From: YennieChen88 <chenyanyi...@jd.com> Sent: Wednesday, August 29, 2018 6:14:11 AM To: user@flink.apache.org Subject: Taskmanager process memory increasing always Hello, My case is counting the number of successful login and failures within 1 hour, 10 min, 5 min, 3 min, 1 min, 10 second and 1 second, keyBy login ip or device id. Based on previous counting results of different time dimensions, predict the complicance of the next login. After varous attempts, I chose slide windows to count, e.g. 1 hour window size with 1 min window step, 10 min widonw size with 10 second window step, 5 min window with 5 second window step... Except this, I used rocksdb as state backend, and enabled checkpoint. But now encounter some problems. 1. The RES memory of every taskmanager process is increasing all the time and can not be stable, until the process killed without any OOM exception log. <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1520/memory_usage.png> After several tests, I found that the process memory increase is related to the key (ip or device id). If key values fix in a certain range, process memory can be stable. But if key values randomly changing, the memory increasing. In fact, the key login ip and device id is random. We also found that login reduces after the midnight, and the memory can be shortly stable. But memory increases during the day. I ran a job 15 days ago, the memory is still increasing.The key random changes, the memory increases, is it normal? 2. The rocksdb seems take up a lot of memory. If I changed rocksdb to file system state backend, the memory can drop to around 30%. If there is no limit configuration, will rocksdb's used memory increases all the time? 3. There are some taskmanagers of the flink cluster do not run any task (no slot be used), but the memory is also increasing linearly after the job run several days. What do they use memory for? I have no idea. <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1520/memory_usage2.png> Hope for your reply. Thank you. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/