Thanks everyone for the responses.
I tried out the JeMalloc suggestion from FLINK-19125 using a patched 1.11.3
image and so far it appears to working well. I see it's included in 1.12.1
and Docker images are available so I'll look at upgrading too.
Best regards,
Randal.
--
Sent from: http://a
Cc: user
Subject: Re: Memory usage increases on every job restart resulting in eventual
OOMKill
Hi
We had something similar and our problem was class loader leaks. We used a
summary log component to reduce logging but still turned out that it used a
static object that wasn’t released when we
Hi
We had something similar and our problem was class loader leaks. We used a
summary log component to reduce logging but still turned out that it used a
static object that wasn’t released when we got an OOM or restart. Flink was
reusing task managers so only workaround was to stop the job wait
>
> How is the memory measured?
I meant which flink or k8s metric is collected? I'm asking because
depending on which metric is used, the *container memory usage* can be
defined differently. E.g., whether mmap memory is included.
Also, could you share the effective memory configurations for the
t
Hi Xintong Song,
Correct, we are using standalone k8s. Task managers are deployed as a
statefulset so have consistent pod names. We tried using native k8s (in fact
I'd prefer to) but got persistent
"io.fabric8.kubernetes.client.KubernetesClientException: too old resource
version: 242214695 (242413
Hi Randal,
The image is too blurred to be clearly seen.
I have a few questions.
- IIUC, you are using the standalone K8s deployment [1], not the native K8s
deployment [2]. Could you confirm that?
- How is the memory measured?
Thank you~
Xintong Song
[1]
https://ci.apache.org/projects/flink/flin
Hi,
We're running Flink 1.11.3 on Kubernetes. We have a job with parallelism of
10 running on 10 task managers each with 1 task slot. The job has 4 time
windows with 2 different keys, 2 windows have reducers and 2 are processed
by window functions. State is stored in RocksDB.
We've noticed when a