Hi.

I upgraded to Flink v1.14.4 and now my Flink TaskManagers are being killed
by Kubernetes for exceeding the requested memory.  My Flink TM is using an
extra ~5gb of memory over the tm.memory.process.size.

Here are the flink-config values that I'm using
    taskmanager.memory.process.size: 25600mb
    # The default, 256mb, is too small.
    taskmanager.memory.jvm-metaspace.size: 320mb
    taskmanager.memory.network.fraction: 0.2
    taskmanager.memory.network.max: 2560m

I'm requesting 26112Mi in my Kubernetes config (so there's some buffer).

I re-read the Flink docs
<https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/memory/mem_setup/>
on
setting memory.  This seems like it should be fine.  The diagrams and docs
show that process.size is used.

If it helps, the TMs are failing in a round robin once every ~30 minutes or
so.  This isn't an issue with Flink v1.12.3 but is an issue with Flink
v1.14.4.

My text logs have a bunch of kafka connections in them.  I don't know if
that's related to overallocating memory.

❯ kubectl -n flink-v1-14-4 get events

LAST SEEN   TYPE      REASON                OBJECT
MESSAGE

37m         Warning   Evicted               pod/flink-taskmanager-3         The
node was low on resource: memory. Container taskmanager was using
31457992Ki, which exceeds its request of 26112Mi.

37m         Normal    Killing               pod/flink-taskmanager-3
     Stopping
container taskmanager

37m         Normal    Scheduled             pod/flink-taskmanager-3
     Successfully
assigned hipcamp-prod-metrics-flink-v1-14-4/flink-taskmanager-3 to
ip-10-12-104-15.ec2.internal

37m         Normal    Pulled                pod/flink-taskmanager-3
     Container
image "flink:1.14.4" already present on machine

37m         Normal    Created               pod/flink-taskmanager-3
     Created
container taskmanager

37m         Normal    Started               pod/flink-taskmanager-3
     Started
container taskmanager

37m         Normal    SuccessfulCreate
statefulset/flink-taskmanager   create
Pod flink-taskmanager-3 in StatefulSet flink-taskmanager successful

37m         Warning   RecreatingFailedPod
statefulset/flink-taskmanager   StatefulSet
hipcamp-prod-metrics-flink-v1-14-4/flink-taskmanager is recreating failed
Pod flink-taskmanager-3

37m         Normal    SuccessfulDelete
statefulset/flink-taskmanager   delete
Pod flink-taskmanager-3 in StatefulSet flink-taskmanager successful

Reply via email to