[ https://issues.apache.org/jira/browse/FLINK-16267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17159088#comment-17159088 ]
Xintong Song commented on FLINK-16267: -------------------------------------- Hi [~YordanPavlov], Could you try to turn `state.backend.rocksdb.memory.managed` back to `true`, and increase `taskmanager.memory.jvm-overhead.[min|max]`? > Flink uses more memory than taskmanager.memory.process.size in Kubernetes > ------------------------------------------------------------------------- > > Key: FLINK-16267 > URL: https://issues.apache.org/jira/browse/FLINK-16267 > Project: Flink > Issue Type: Bug > Components: Runtime / Task > Affects Versions: 1.10.0 > Reporter: ChangZhuo Chen (陳昌倬) > Priority: Major > Attachments: flink-conf_1.10.0.yaml, flink-conf_1.9.1.yaml, > oomkilled_taskmanager.log > > Time Spent: 10m > Remaining Estimate: 0h > > This issue is from > [https://stackoverflow.com/questions/60336764/flink-uses-more-memory-than-taskmanager-memory-process-size-in-kubernetes] > h1. Description > * In Flink 1.10.0, we try to use `taskmanager.memory.process.size` to limit > the resource used by taskmanager to ensure they are not killed by Kubernetes. > However, we still get lots of taskmanager `OOMKilled`. The setup is in the > following section. > * The taskmanager log is in attachment [^oomkilled_taskmanager.log]. > h2. Kubernete > * The Kubernetes setup is the same as described in > [https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/kubernetes.html]. > * The following is resource configuration for taskmanager deployment in > Kubernetes: > {{resources:}} > {{ requests:}} > {{ cpu: 1000m}} > {{ memory: 4096Mi}} > {{ limits:}} > {{ cpu: 1000m}} > {{ memory: 4096Mi}} > h2. Flink Docker > * The Flink docker is built by the following Docker file. > {{FROM flink:1.10-scala_2.11}} > RUN mkdir -p /opt/flink/plugins/s3 && > ln -s /opt/flink/opt/flink-s3-fs-presto-1.10.0.jar /opt/flink/plugins/s3/ > {{RUN ln -s /opt/flink/opt/flink-metrics-prometheus-1.10.0.jar > /opt/flink/lib/}} > h2. Flink Configuration > * The following are all memory related configurations in `flink-conf.yaml` > in 1.10.0: > {{jobmanager.heap.size: 820m}} > {{taskmanager.memory.jvm-metaspace.size: 128m}} > {{taskmanager.memory.process.size: 4096m}} > * We use RocksDB and we don't set `state.backend.rocksdb.memory.managed` in > `flink-conf.yaml`. > ** Use S3 as checkpoint storage. > * The code uses DateStream API > ** input/output are both Kafka. > h2. Project Dependencies > * The following is our dependencies. > {{val flinkVersion = "1.10.0"}}{{libraryDependencies += > "com.squareup.okhttp3" % "okhttp" % "4.2.2"}} > {{libraryDependencies += "com.typesafe" % "config" % "1.4.0"}} > {{libraryDependencies += "joda-time" % "joda-time" % "2.10.5"}} > {{libraryDependencies += "org.apache.flink" %% "flink-connector-kafka" % > flinkVersion}} > {{libraryDependencies += "org.apache.flink" % "flink-metrics-dropwizard" % > flinkVersion}} > {{libraryDependencies += "org.apache.flink" %% "flink-scala" % flinkVersion > % "provided"}} > {{libraryDependencies += "org.apache.flink" %% "flink-statebackend-rocksdb" > % flinkVersion % "provided"}} > {{libraryDependencies += "org.apache.flink" %% "flink-streaming-scala" % > flinkVersion % "provided"}} > {{libraryDependencies += "org.json4s" %% "json4s-jackson" % "3.6.7"}} > {{libraryDependencies += "org.log4s" %% "log4s" % "1.8.2"}} > {{libraryDependencies += "org.rogach" %% "scallop" % "3.3.1"}} > h2. Previous Flink 1.9.1 Configuration > * The configuration we used in Flink 1.9.1 are the following. It does not > have `OOMKilled`. > h3. Kubernetes > {{resources:}} > {{ requests:}} > {{ cpu: 1200m}} > {{ memory: 2G}} > {{ limits:}} > {{ cpu: 1500m}} > {{ memory: 2G}} > h3. Flink 1.9.1 > {{jobmanager.heap.size: 820m}} > {{taskmanager.heap.size: 1024m}} -- This message was sent by Atlassian Jira (v8.3.4#803005)