[ https://issues.apache.org/jira/browse/FLINK-36394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
JieTan updated FLINK-36394: --------------------------- Description: We should set the size of the Init JVM Metaspace to a sane default, like {{{}-XX:MetaspaceSize=128m{}}}. When a Flink job starts, it needs to load class metadata information, including class structure information, method information, and field information. As a result, the JVM frequently performs FullGC. Sometimes, the Metadata GC Threshold (Full GC) is 3.7s long before the Akka System is started. {code:java} 2024-07-23T20:08:40.554+0800: 1.801: [Full GC (Metadata GC Threshold) 2024-07-23T20:08:40.554+0800: 1.801: [Tenured: 0K->14029K(966656K), 0.0336205 secs] 108273K->14029K(1401664K), [Metaspace: 20594K->20594K(1067008K)], 0.0337494 secs] [Times: user=0.02 sys=0.00, real=0.04 secs] 2024-07-23T20:08:42.052+0800: 3.300: [Full GC (Metadata GC Threshold) 2024-07-23T20:08:42.052+0800: 3.300: [Tenured: 14029K->28979K(966656K), 0.0503497 secs] 184494K->28979K(1401664K), [Metaspace: 34384K->34384K(1079296K)], 0.0505108 secs] [Times: user=0.04 sys=0.00, real=0.05 secs] 2024-07-23T20:08:45.064+0800: 6.312: [Full GC (Metadata GC Threshold) 2024-07-23T20:08:45.064+0800: 6.312: [Tenured: 28979K->54550K(966656K), 0.0876744 secs] 248437K->54550K(1401664K), [Metaspace: 57015K->57015K(1099776K)], 0.0879053 secs] [Times: user=0.07 sys=0.01, real=0.09 secs] {code} The final Metaspace size takes up 100MB+. h2. *Solution:* {*}I{*}f we configure Init Metaspace size by default. As a result, Flink job Metaspace FullGC reduced. * Add *jobmanager.memory.jvm-init-metaspace.size* to JobManagerOptions. * Add *taskmanager.memory.jvm-init-metaspace.size* to TaskManagerOptions {code:java} OpenJDK 64-Bit Server VM (25.362-b09) for linux-amd64 JRE (1.8.0_362-ByteOpenJDK-b09), built on Feb 20 2023 09:42:31 by "root" with gcc 8.3.0 Memory: 4k page, physical 4018480k(137636k free), swap 0k(0k free) CommandLine flags: -XX:CompressedClassSpaceSize=260046848 -XX:GCLogFileSize=104857600 -XX:InitialHeapSize=3368026112 -XX:MaxHeapSize=3368026112 -XX:MaxMetaspaceSize=268435456 -XX:MetaspaceSize=134217728 -XX:NumberOfGCLogFiles=5 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseGCLogFileRotation {code} was: We should set the size of the Init JVM Metaspace to a sane default, like {{{}-XX:MetaspaceSize=128m{}}}. When a Flink job starts, it needs to load class metadata information, including class structure information, method information, and field information. As a result, the JVM frequently performs FullGC. Sometimes, the Metadata GC Threshold (Full GC) is 3.7s long before the Akka System is started. {code:java} 2024-07-23T20:08:40.554+0800: 1.801: [Full GC (Metadata GC Threshold) 2024-07-23T20:08:40.554+0800: 1.801: [Tenured: 0K->14029K(966656K), 0.0336205 secs] 108273K->14029K(1401664K), [Metaspace: 20594K->20594K(1067008K)], 0.0337494 secs] [Times: user=0.02 sys=0.00, real=0.04 secs] 2024-07-23T20:08:42.052+0800: 3.300: [Full GC (Metadata GC Threshold) 2024-07-23T20:08:42.052+0800: 3.300: [Tenured: 14029K->28979K(966656K), 0.0503497 secs] 184494K->28979K(1401664K), [Metaspace: 34384K->34384K(1079296K)], 0.0505108 secs] [Times: user=0.04 sys=0.00, real=0.05 secs] 2024-07-23T20:08:45.064+0800: 6.312: [Full GC (Metadata GC Threshold) 2024-07-23T20:08:45.064+0800: 6.312: [Tenured: 28979K->54550K(966656K), 0.0876744 secs] 248437K->54550K(1401664K), [Metaspace: 57015K->57015K(1099776K)], 0.0879053 secs] [Times: user=0.07 sys=0.01, real=0.09 secs] {code} The final Metaspace size takes up 100MB+. h2. *Solution:* {*}I{*}f we configure Init Metaspace size by default. As a result, Flink job Metaspace FullGC reduced. * Add *jobmanager.memory.jvm-init-metaspace.size* to JobManagerOptions. * Add *taskmanager.memory.jvm-init-metaspace.size* to JobManagerOptions {code:java} OpenJDK 64-Bit Server VM (25.362-b09) for linux-amd64 JRE (1.8.0_362-ByteOpenJDK-b09), built on Feb 20 2023 09:42:31 by "root" with gcc 8.3.0 Memory: 4k page, physical 4018480k(137636k free), swap 0k(0k free) CommandLine flags: -XX:CompressedClassSpaceSize=260046848 -XX:GCLogFileSize=104857600 -XX:InitialHeapSize=3368026112 -XX:MaxHeapSize=3368026112 -XX:MaxMetaspaceSize=268435456 -XX:MetaspaceSize=134217728 -XX:NumberOfGCLogFiles=5 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseGCLogFileRotation {code} > Configure Init Metaspace size by default > ---------------------------------------- > > Key: FLINK-36394 > URL: https://issues.apache.org/jira/browse/FLINK-36394 > Project: Flink > Issue Type: Bug > Components: API / Core > Affects Versions: 1.16.0, 1.17.2 > Reporter: JieTan > Priority: Major > Fix For: 1.16.0, 1.17.0 > > Attachments: JVMMetaspace.png, JobManagerJVMOptions.png, > TaskManagerJVMOptions.png > > > We should set the size of the Init JVM Metaspace to a sane default, like > {{{}-XX:MetaspaceSize=128m{}}}. > When a Flink job starts, it needs to load class metadata information, > including class structure information, method information, and field > information. As a result, the JVM frequently performs FullGC. Sometimes, the > Metadata GC Threshold (Full GC) is 3.7s long before the Akka System is > started. > > {code:java} > 2024-07-23T20:08:40.554+0800: 1.801: [Full GC (Metadata GC Threshold) > 2024-07-23T20:08:40.554+0800: 1.801: [Tenured: 0K->14029K(966656K), 0.0336205 > secs] 108273K->14029K(1401664K), [Metaspace: 20594K->20594K(1067008K)], > 0.0337494 secs] [Times: user=0.02 sys=0.00, real=0.04 secs] > 2024-07-23T20:08:42.052+0800: 3.300: [Full GC (Metadata GC Threshold) > 2024-07-23T20:08:42.052+0800: 3.300: [Tenured: 14029K->28979K(966656K), > 0.0503497 secs] 184494K->28979K(1401664K), [Metaspace: > 34384K->34384K(1079296K)], 0.0505108 secs] [Times: user=0.04 sys=0.00, > real=0.05 secs] > 2024-07-23T20:08:45.064+0800: 6.312: [Full GC (Metadata GC Threshold) > 2024-07-23T20:08:45.064+0800: 6.312: [Tenured: 28979K->54550K(966656K), > 0.0876744 secs] 248437K->54550K(1401664K), [Metaspace: > 57015K->57015K(1099776K)], 0.0879053 secs] [Times: user=0.07 sys=0.01, > real=0.09 secs] {code} > > The final Metaspace size takes up 100MB+. > h2. *Solution:* > {*}I{*}f we configure Init Metaspace size by default. As a result, Flink job > Metaspace FullGC reduced. > * Add *jobmanager.memory.jvm-init-metaspace.size* to JobManagerOptions. > * Add *taskmanager.memory.jvm-init-metaspace.size* to TaskManagerOptions > > {code:java} > OpenJDK 64-Bit Server VM (25.362-b09) for linux-amd64 JRE > (1.8.0_362-ByteOpenJDK-b09), built on Feb 20 2023 09:42:31 by "root" with gcc > 8.3.0 > Memory: 4k page, physical 4018480k(137636k free), swap 0k(0k free) > CommandLine flags: -XX:CompressedClassSpaceSize=260046848 > -XX:GCLogFileSize=104857600 -XX:InitialHeapSize=3368026112 > -XX:MaxHeapSize=3368026112 -XX:MaxMetaspaceSize=268435456 > -XX:MetaspaceSize=134217728 -XX:NumberOfGCLogFiles=5 -XX:+PrintGC > -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps > -XX:+UseCompressedClassPointers -XX:+UseCompressedOops > -XX:+UseGCLogFileRotation {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)