Hi Prakhar, Have you ever checked the garbage collection of master? Which version of Flink are you using? How many TaskManagers in your cluster?
Prakhar Mathur <prakha...@go-jek.com> 于2019年7月18日周四 下午1:54写道: > Hello, > > We have deployed multiple Flink clusters on Kubernetess with 1 replica of > Jobmanager and multiple of Taskmanager as per the requirement. Recently we > are observing that on increasing the number of Taskmanagers for a cluster, > the Jobmanager becomes irresponsive. It stops sending statsd metric for > some irregular interval. Even the Jobmanager pod keeps restarting because > it stops responding to the liveliness probe which results in Kubernetes > killing the pod. We tried increasing the resources given(CPU, RAM) but it > didn't help. > > Regards > Prakhar Mathur > Product Engineer > GO-JEK >