Hi. I upgraded to Flink v1.14.4 and now my Flink TaskManagers are being killed by Kubernetes for exceeding the requested memory. My Flink TM is using an extra ~5gb of memory over the tm.memory.process.size.
Here are the flink-config values that I'm using taskmanager.memory.process.size: 25600mb # The default, 256mb, is too small. taskmanager.memory.jvm-metaspace.size: 320mb taskmanager.memory.network.fraction: 0.2 taskmanager.memory.network.max: 2560m I'm requesting 26112Mi in my Kubernetes config (so there's some buffer). I re-read the Flink docs <https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/memory/mem_setup/> on setting memory. This seems like it should be fine. The diagrams and docs show that process.size is used. If it helps, the TMs are failing in a round robin once every ~30 minutes or so. This isn't an issue with Flink v1.12.3 but is an issue with Flink v1.14.4. My text logs have a bunch of kafka connections in them. I don't know if that's related to overallocating memory. ❯ kubectl -n flink-v1-14-4 get events LAST SEEN TYPE REASON OBJECT MESSAGE 37m Warning Evicted pod/flink-taskmanager-3 The node was low on resource: memory. Container taskmanager was using 31457992Ki, which exceeds its request of 26112Mi. 37m Normal Killing pod/flink-taskmanager-3 Stopping container taskmanager 37m Normal Scheduled pod/flink-taskmanager-3 Successfully assigned hipcamp-prod-metrics-flink-v1-14-4/flink-taskmanager-3 to ip-10-12-104-15.ec2.internal 37m Normal Pulled pod/flink-taskmanager-3 Container image "flink:1.14.4" already present on machine 37m Normal Created pod/flink-taskmanager-3 Created container taskmanager 37m Normal Started pod/flink-taskmanager-3 Started container taskmanager 37m Normal SuccessfulCreate statefulset/flink-taskmanager create Pod flink-taskmanager-3 in StatefulSet flink-taskmanager successful 37m Warning RecreatingFailedPod statefulset/flink-taskmanager StatefulSet hipcamp-prod-metrics-flink-v1-14-4/flink-taskmanager is recreating failed Pod flink-taskmanager-3 37m Normal SuccessfulDelete statefulset/flink-taskmanager delete Pod flink-taskmanager-3 in StatefulSet flink-taskmanager successful