[ https://issues.apache.org/jira/browse/FLINK-25321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461157#comment-17461157 ]
Gao Fei edited comment on FLINK-25321 at 12/17/21, 1:52 AM: ------------------------------------------------------------ [~wangyang0918] I tried to adjust the overhead ratio from the default 0.1 to 0.3. It really works well, but how do I judge how much memory is needed? The overhead memory mainly contains those pieces of memory, and there is no specific description in the document. I use Native Memory Tracking to track the off-heap memory, is it overhead = thread+code+gc+compiler+internal+symbol? Do you want to set at least 1.5GB here? There is another problem. I set the total memory of the TM process, but the actual process RSS memory will always exceed this value. Does it mean that the overhead memory is only a relative value and cannot be absolutely restricted? (taskmanager.memory.jvm-overhead.fraction: 0.3) INFO [] - Final TaskExecutor Memory configuration: INFO [] - Total Process Memory: 3.000gb (3221225472 bytes) INFO [] - Total Flink Memory: 1.850gb (1986422336 bytes) INFO [] - Total JVM Heap Memory: 1.540gb (1653562372 bytes) INFO [] - Framework: 128.000mb (134217728 bytes) INFO [] - Task: 1.415gb (1519344644 bytes) INFO [] - Total Off-heap Memory: 317.440mb (332859964 bytes) INFO [] - Managed: 0 bytes INFO [] - Total JVM Direct Memory: 317.440mb (332859964 bytes) INFO [] - Framework: 128.000mb (134217728 bytes) INFO [] - Task: 0 bytes INFO [] - Network: 189.440mb (198642236 bytes) INFO [] - JVM Metaspace: 256.000mb (268435456 bytes) INFO [] - JVM Overhead: 921.600mb (966367680 bytes) Native Memory Tracking: Total: reserved=4211MB +32MB, committed=2992MB +517MB - Java Heap (reserved=1578MB, committed=1578MB +464MB) (mmap: reserved=1578MB, committed=1578MB +464MB) - Class (reserved=1103MB +2MB, committed=89MB +1MB) (classes #14013 -213) (malloc=3MB #20610 +1596) (mmap: reserved=1100MB +2MB, committed=87MB +1MB) - Thread (reserved=854MB +1MB, committed=854MB +1MB) (thread #848 +1) (stack: reserved=850MB +1MB, committed=850MB +1MB) (malloc=3MB #5077 +6) (arena=1MB #1692 +2) - Code (reserved=252MB +1MB, committed=49MB +6MB) (malloc=8MB +1MB #15043 +1500) (mmap: reserved=244MB, committed=41MB +5MB) - GC (reserved=121MB +15MB, committed=121MB +32MB) (malloc=31MB +15MB #44400 +9384) (mmap: reserved=91MB, committed=91MB +17MB) - Compiler (reserved=3MB, committed=3MB) (malloc=3MB #4000 +134) - Internal (reserved=262MB +3MB, committed=262MB +3MB) (malloc=262MB +3MB #51098 +2499) - Symbol (reserved=20MB, committed=20MB) (malloc=18MB #160625 -83) (arena=2MB #1) - Native Memory Tracking (reserved=5MB, committed=5MB) (tracking overhead=5MB) - Arena Chunk (reserved=11MB +10MB, committed=11MB +10MB) (malloc=11MB +10MB) - Unknown (reserved=3MB, committed=0MB) (mmap: reserved=3MB, committed=0MB) was (Author: jackin853): I tried to adjust the overhead ratio from the default 0.1 to 0.3. It really works well, but how do I judge how much memory is needed? The overhead memory mainly contains those pieces of memory, and there is no specific description in the document. I use Native Memory Tracking to track the off-heap memory, is it overhead = thread+code+gc+compiler+internal+symbol? Do you want to set at least 1.5GB here? There is another problem. I set the total memory of the TM process, but the actual process RSS memory will always exceed this value. Does it mean that the overhead memory is only a relative value and cannot be absolutely restricted? (taskmanager.memory.jvm-overhead.fraction: 0.3) INFO [] - Final TaskExecutor Memory configuration: INFO [] - Total Process Memory: 3.000gb (3221225472 bytes) INFO [] - Total Flink Memory: 1.850gb (1986422336 bytes) INFO [] - Total JVM Heap Memory: 1.540gb (1653562372 bytes) INFO [] - Framework: 128.000mb (134217728 bytes) INFO [] - Task: 1.415gb (1519344644 bytes) INFO [] - Total Off-heap Memory: 317.440mb (332859964 bytes) INFO [] - Managed: 0 bytes INFO [] - Total JVM Direct Memory: 317.440mb (332859964 bytes) INFO [] - Framework: 128.000mb (134217728 bytes) INFO [] - Task: 0 bytes INFO [] - Network: 189.440mb (198642236 bytes) INFO [] - JVM Metaspace: 256.000mb (268435456 bytes) INFO [] - JVM Overhead: 921.600mb (966367680 bytes) !image-2021-12-17-09-51-10-924.png! > standalone deploy on k8s,pod always OOM killed,actual heap memory usage is > normal, gc is normal > ----------------------------------------------------------------------------------------------- > > Key: FLINK-25321 > URL: https://issues.apache.org/jira/browse/FLINK-25321 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes > Affects Versions: 1.11.3 > Environment: Flink 1.11.3 > k8s v1.21.0 > standlone deployment > Reporter: Gao Fei > Priority: Major > > Start a cluster on k8s, deploy in standalone mode, a jobmanager pod (1G) and > a taskmanager pod (3372MB limit), the total memory configuration of the Flink > TM process is 3072MB, and the managed configuration is 0, both of which are > on the heap memory. Now the pod It will always be OOM killed, and the total > process memory will always exceed 3072MB. I saw that the system has adopted > jemlloc. There is no 64M problem. The application itself has not applied for > direct memory. It is strange why the process is always killed by OOM after a > period of time. > > INFO [] - Final TaskExecutor Memory configuration: > INFO [] - Total Process Memory: 3.000gb (3221225472 bytes) > INFO [] - Total Flink Memory: 2.450gb (2630667464 bytes) > INFO [] - Total JVM Heap Memory: 2.080gb (2233382986 bytes) > INFO [] - Framework: 128.000mb (134217728 bytes) > INFO [] - Task: 1.955gb (2099165258 bytes) > INFO [] - Total Off-heap Memory: 378.880mb (397284478 bytes) > INFO [] - Managed: 0 bytes > INFO [] - Total JVM Direct Memory: 378.880mb (397284478 bytes) > INFO [] - Framework: 128.000mb (134217728 bytes) > INFO [] - Task: 0 bytes > INFO [] - Network: 250.880mb (263066750 bytes) > INFO [] - JVM Metaspace: 256.000mb (268435456 bytes) > INFO [] - JVM Overhead: 307.200mb (322122552 bytes) -- This message was sent by Atlassian Jira (v8.20.1#820001)