Re: Memory usage increases on every job restart resulting in eventual OOMKill

2021-02-03 Thread Randal Pitt
Thanks everyone for the responses. I tried out the JeMalloc suggestion from FLINK-19125 using a patched 1.11.3 image and so far it appears to working well. I see it's included in 1.12.1 and Docker images are available so I'll look at upgrading too. Best regards, Randal. -- Sent from: http://a

Re: Memory usage increases on every job restart resulting in eventual OOMKill

2021-02-03 Thread Yun Tang
Cc: user Subject: Re: Memory usage increases on every job restart resulting in eventual OOMKill Hi We had something similar and our problem was class loader leaks. We used a summary log component to reduce logging but still turned out that it used a static object that wasn’t released when we

Re: Memory usage increases on every job restart resulting in eventual OOMKill

2021-02-02 Thread Lasse Nedergaard
Hi We had something similar and our problem was class loader leaks. We used a summary log component to reduce logging but still turned out that it used a static object that wasn’t released when we got an OOM or restart. Flink was reusing task managers so only workaround was to stop the job wait

Re: Memory usage increases on every job restart resulting in eventual OOMKill

2021-02-02 Thread Xintong Song
> > How is the memory measured? I meant which flink or k8s metric is collected? I'm asking because depending on which metric is used, the *container memory usage* can be defined differently. E.g., whether mmap memory is included. Also, could you share the effective memory configurations for the t

Re: Memory usage increases on every job restart resulting in eventual OOMKill

2021-02-02 Thread Randal Pitt
Hi Xintong Song, Correct, we are using standalone k8s. Task managers are deployed as a statefulset so have consistent pod names. We tried using native k8s (in fact I'd prefer to) but got persistent "io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 242214695 (242413

Re: Memory usage increases on every job restart resulting in eventual OOMKill

2021-02-02 Thread Xintong Song
Hi Randal, The image is too blurred to be clearly seen. I have a few questions. - IIUC, you are using the standalone K8s deployment [1], not the native K8s deployment [2]. Could you confirm that? - How is the memory measured? Thank you~ Xintong Song [1] https://ci.apache.org/projects/flink/flin

Memory usage increases on every job restart resulting in eventual OOMKill

2021-02-02 Thread Randal Pitt
Hi, We're running Flink 1.11.3 on Kubernetes. We have a job with parallelism of 10 running on 10 task managers each with 1 task slot. The job has 4 time windows with 2 different keys, 2 windows have reducers and 2 are processed by window functions. State is stored in RocksDB. We've noticed when a