[ https://issues.apache.org/jira/browse/FLINK-25321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461157#comment-17461157 ]
Gao Fei commented on FLINK-25321: --------------------------------- I tried to adjust the overhead ratio from the default 0.1 to 0.3. It really works well, but how do I judge how much memory is needed? The overhead memory mainly contains those pieces of memory, and there is no specific description in the document. I use Native Memory Tracking to track the off-heap memory, is it overhead = thread+code+gc+compiler+internal+symbol? Do you want to set at least 1.5GB here? There is another problem. I set the total memory of the TM process, but the actual process RSS memory will always exceed this value. Does it mean that the overhead memory is only a relative value and cannot be absolutely restricted? (taskmanager.memory.jvm-overhead.fraction: 0.3) INFO [] - Final TaskExecutor Memory configuration: INFO [] - Total Process Memory: 3.000gb (3221225472 bytes) INFO [] - Total Flink Memory: 1.850gb (1986422336 bytes) INFO [] - Total JVM Heap Memory: 1.540gb (1653562372 bytes) INFO [] - Framework: 128.000mb (134217728 bytes) INFO [] - Task: 1.415gb (1519344644 bytes) INFO [] - Total Off-heap Memory: 317.440mb (332859964 bytes) INFO [] - Managed: 0 bytes INFO [] - Total JVM Direct Memory: 317.440mb (332859964 bytes) INFO [] - Framework: 128.000mb (134217728 bytes) INFO [] - Task: 0 bytes INFO [] - Network: 189.440mb (198642236 bytes) INFO [] - JVM Metaspace: 256.000mb (268435456 bytes) INFO [] - JVM Overhead: 921.600mb (966367680 bytes) !image-2021-12-17-09-51-10-924.png! > standalone deploy on k8s,pod always OOM killed,actual heap memory usage is > normal, gc is normal > ----------------------------------------------------------------------------------------------- > > Key: FLINK-25321 > URL: https://issues.apache.org/jira/browse/FLINK-25321 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes > Affects Versions: 1.11.3 > Environment: Flink 1.11.3 > k8s v1.21.0 > standlone deployment > Reporter: Gao Fei > Priority: Major > > Start a cluster on k8s, deploy in standalone mode, a jobmanager pod (1G) and > a taskmanager pod (3372MB limit), the total memory configuration of the Flink > TM process is 3072MB, and the managed configuration is 0, both of which are > on the heap memory. Now the pod It will always be OOM killed, and the total > process memory will always exceed 3072MB. I saw that the system has adopted > jemlloc. There is no 64M problem. The application itself has not applied for > direct memory. It is strange why the process is always killed by OOM after a > period of time. > > INFO [] - Final TaskExecutor Memory configuration: > INFO [] - Total Process Memory: 3.000gb (3221225472 bytes) > INFO [] - Total Flink Memory: 2.450gb (2630667464 bytes) > INFO [] - Total JVM Heap Memory: 2.080gb (2233382986 bytes) > INFO [] - Framework: 128.000mb (134217728 bytes) > INFO [] - Task: 1.955gb (2099165258 bytes) > INFO [] - Total Off-heap Memory: 378.880mb (397284478 bytes) > INFO [] - Managed: 0 bytes > INFO [] - Total JVM Direct Memory: 378.880mb (397284478 bytes) > INFO [] - Framework: 128.000mb (134217728 bytes) > INFO [] - Task: 0 bytes > INFO [] - Network: 250.880mb (263066750 bytes) > INFO [] - JVM Metaspace: 256.000mb (268435456 bytes) > INFO [] - JVM Overhead: 307.200mb (322122552 bytes) -- This message was sent by Atlassian Jira (v8.20.1#820001)