[ https://issues.apache.org/jira/browse/FLINK-16267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264194#comment-17264194 ]
Yordan Pavlov commented on FLINK-16267: --------------------------------------- I have came back to this ticket to investigate cases where the memory usage of a Kubernetes Pod exceeds what has been requested. Going over the suggestions above I experimented with the settings: {code:java} state.backend.rocksdb.metrics.block-cache-capacity: true state.backend.rocksdb.metrics.block-cache-usage: true {code} In my case the usage starts exceeding the capacity as the job runs. Here is a more detailed description I have 4 task managers each with 2 slots. I am using RocksDBStateBackend for both checkpointing and state variables. Looking at the UI of a single Task Manager I see "Flink Managed Memory:3.42 GB", on TaskManager start I see the following log: {noformat} INFO org.apache.flink.contrib.streaming.state.RocksDBStateBackend - Obtained shared RocksDB cache of size 1833749733 bytes' {noformat} Looking at the metrics exposed I see block cache capacity to be 1528124777 (1.5 GB), this is below the 1.8 GB logged at startup, I am presuming the rest goes for checkpoints (?). After one hour of work the block cache capacity is 1G+ per TaskManager per slot. As the value is different for each slot, I presume those need to be summed per TaskManager and the value should remain under the 1.5 GB from above. Please correct me where my understanding is not correct, also is there anything which can be done to restrain RocksDB from keep eating more memory? Eventually the Pod would be terminated and the job restarted. > Flink uses more memory than taskmanager.memory.process.size in Kubernetes > ------------------------------------------------------------------------- > > Key: FLINK-16267 > URL: https://issues.apache.org/jira/browse/FLINK-16267 > Project: Flink > Issue Type: Bug > Components: Runtime / Task > Affects Versions: 1.10.0 > Reporter: ChangZhuo Chen (陳昌倬) > Priority: Major > Attachments: flink-conf_1.10.0.yaml, flink-conf_1.9.1.yaml, > oomkilled_taskmanager.log > > Time Spent: 10m > Remaining Estimate: 0h > > This issue is from > [https://stackoverflow.com/questions/60336764/flink-uses-more-memory-than-taskmanager-memory-process-size-in-kubernetes] > h1. Description > * In Flink 1.10.0, we try to use `taskmanager.memory.process.size` to limit > the resource used by taskmanager to ensure they are not killed by Kubernetes. > However, we still get lots of taskmanager `OOMKilled`. The setup is in the > following section. > * The taskmanager log is in attachment [^oomkilled_taskmanager.log]. > h2. Kubernete > * The Kubernetes setup is the same as described in > [https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/kubernetes.html]. > * The following is resource configuration for taskmanager deployment in > Kubernetes: > {{resources:}} > {{ requests:}} > {{ cpu: 1000m}} > {{ memory: 4096Mi}} > {{ limits:}} > {{ cpu: 1000m}} > {{ memory: 4096Mi}} > h2. Flink Docker > * The Flink docker is built by the following Docker file. > {{FROM flink:1.10-scala_2.11}} > RUN mkdir -p /opt/flink/plugins/s3 && > ln -s /opt/flink/opt/flink-s3-fs-presto-1.10.0.jar /opt/flink/plugins/s3/ > {{RUN ln -s /opt/flink/opt/flink-metrics-prometheus-1.10.0.jar > /opt/flink/lib/}} > h2. Flink Configuration > * The following are all memory related configurations in `flink-conf.yaml` > in 1.10.0: > {{jobmanager.heap.size: 820m}} > {{taskmanager.memory.jvm-metaspace.size: 128m}} > {{taskmanager.memory.process.size: 4096m}} > * We use RocksDB and we don't set `state.backend.rocksdb.memory.managed` in > `flink-conf.yaml`. > ** Use S3 as checkpoint storage. > * The code uses DateStream API > ** input/output are both Kafka. > h2. Project Dependencies > * The following is our dependencies. > {{val flinkVersion = "1.10.0"}}{{libraryDependencies += > "com.squareup.okhttp3" % "okhttp" % "4.2.2"}} > {{libraryDependencies += "com.typesafe" % "config" % "1.4.0"}} > {{libraryDependencies += "joda-time" % "joda-time" % "2.10.5"}} > {{libraryDependencies += "org.apache.flink" %% "flink-connector-kafka" % > flinkVersion}} > {{libraryDependencies += "org.apache.flink" % "flink-metrics-dropwizard" % > flinkVersion}} > {{libraryDependencies += "org.apache.flink" %% "flink-scala" % flinkVersion > % "provided"}} > {{libraryDependencies += "org.apache.flink" %% "flink-statebackend-rocksdb" > % flinkVersion % "provided"}} > {{libraryDependencies += "org.apache.flink" %% "flink-streaming-scala" % > flinkVersion % "provided"}} > {{libraryDependencies += "org.json4s" %% "json4s-jackson" % "3.6.7"}} > {{libraryDependencies += "org.log4s" %% "log4s" % "1.8.2"}} > {{libraryDependencies += "org.rogach" %% "scallop" % "3.3.1"}} > h2. Previous Flink 1.9.1 Configuration > * The configuration we used in Flink 1.9.1 are the following. It does not > have `OOMKilled`. > h3. Kubernetes > {{resources:}} > {{ requests:}} > {{ cpu: 1200m}} > {{ memory: 2G}} > {{ limits:}} > {{ cpu: 1500m}} > {{ memory: 2G}} > h3. Flink 1.9.1 > {{jobmanager.heap.size: 820m}} > {{taskmanager.heap.size: 1024m}} -- This message was sent by Atlassian Jira (v8.3.4#803005)