Hi Dan, Assuming from previous mails that you are using RocksDb … this could have to do with the glibc bug [1][2] … I’m never sure in which setting this is already been taken care of … However your situation is very typical with glibc as allocator underneath RocksDb and giving more memory won’t help much.
Greetings Thias [1] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#switching-the-memory-allocator [2] https://issues.apache.org/jira/browse/FLINK-19125 From: Yang Wang <danrtsey...@gmail.com> Sent: Thursday, April 21, 2022 9:19 AM To: Dan Hill <quietgol...@gmail.com> Cc: user <user@flink.apache.org> Subject: Re: Kubernetes killing TaskManager - Flink ignoring taskmanager.memory.process.size ⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠ Could you please configure a bigger memory to avoid OOM and use NMTracker[1] to figure out the memory usage categories? [1]. https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html Best, Yang Dan Hill <quietgol...@gmail.com<mailto:quietgol...@gmail.com>> 于2022年4月21日周四 07:42写道: Hi. I upgraded to Flink v1.14.4 and now my Flink TaskManagers are being killed by Kubernetes for exceeding the requested memory. My Flink TM is using an extra ~5gb of memory over the tm.memory.process.size. Here are the flink-config values that I'm using taskmanager.memory.process.size: 25600mb # The default, 256mb, is too small. taskmanager.memory.jvm-metaspace.size: 320mb taskmanager.memory.network.fraction: 0.2 taskmanager.memory.network.max: 2560m I'm requesting 26112Mi in my Kubernetes config (so there's some buffer). I re-read the Flink docs<https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/memory/mem_setup/> on setting memory. This seems like it should be fine. The diagrams and docs show that process.size is used. If it helps, the TMs are failing in a round robin once every ~30 minutes or so. This isn't an issue with Flink v1.12.3 but is an issue with Flink v1.14.4. My text logs have a bunch of kafka connections in them. I don't know if that's related to overallocating memory. ❯ kubectl -n flink-v1-14-4 get events LAST SEEN TYPE REASON OBJECT MESSAGE 37m Warning Evicted pod/flink-taskmanager-3 The node was low on resource: memory. Container taskmanager was using 31457992Ki, which exceeds its request of 26112Mi. 37m Normal Killing pod/flink-taskmanager-3 Stopping container taskmanager 37m Normal Scheduled pod/flink-taskmanager-3 Successfully assigned hipcamp-prod-metrics-flink-v1-14-4/flink-taskmanager-3 to ip-10-12-104-15.ec2.internal 37m Normal Pulled pod/flink-taskmanager-3 Container image "flink:1.14.4" already present on machine 37m Normal Created pod/flink-taskmanager-3 Created container taskmanager 37m Normal Started pod/flink-taskmanager-3 Started container taskmanager 37m Normal SuccessfulCreate statefulset/flink-taskmanager create Pod flink-taskmanager-3 in StatefulSet flink-taskmanager successful 37m Warning RecreatingFailedPod statefulset/flink-taskmanager StatefulSet hipcamp-prod-metrics-flink-v1-14-4/flink-taskmanager is recreating failed Pod flink-taskmanager-3 37m Normal SuccessfulDelete statefulset/flink-taskmanager delete Pod flink-taskmanager-3 in StatefulSet flink-taskmanager successful Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng verboten. This message is intended only for the named recipient and may contain confidential or privileged information. As the confidentiality of email communication cannot be guaranteed, we do not accept any responsibility for the confidentiality and the intactness of this message. If you have received it in error, please advise the sender by return e-mail and delete this message and any attachments. Any unauthorised use or dissemination of this information is strictly prohibited.