[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183942#comment-17183942 ]
Chesnay Schepler commented on FLINK-19005: ------------------------------------------ [~DaDaShen] * load heapdump in ecliplse MAT * create histogram * group classes by classloader * among others we can see several ChildFirstClassLoader objects ** these are the user classloaders ** because they are still around, something is leaking it * select one of these entries, and merge the shortest paths to GC roots * there is now one entry for the system classloader * drilling down into it we find the {{java.sql.DriverManager}} ** the contained registeredDrivers array contains multiple drivers for druid, postgresql and calcite * select any of these drivers, use Java Basics -> Class Loader explorer * you are now shown a ChildFirstClassLoader This means that the driver originates from the user classloader, but is referenced from the system classloader. If the reference in the latter is not removed (due to improper cleanup), then the user classloader cannot be garbage collected. > used metaspace grow on every execution > -------------------------------------- > > Key: FLINK-19005 > URL: https://issues.apache.org/jira/browse/FLINK-19005 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission, Runtime / Configuration, > Runtime / Coordination > Affects Versions: 1.11.1 > Reporter: Guillermo Sánchez > Assignee: Chesnay Schepler > Priority: Major > Attachments: heap_dump_after_10_executions.zip, > heap_dump_after_1_execution.zip, heap_dump_echo_lee.tar.xz > > > Hi ! > Im running a 1.11.1 flink cluster, where I execute batch jobs made with > DataSet API. > I submit these jobs every day to calculate daily data. > In every execution, cluster's used metaspace increase by 7MB and its never > released. > This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i > need to restart the cluster to clean the metaspace > taskmanager.memory.jvm-metaspace.size is set to 512mb > Any idea of what could be causing this metaspace grow and why is it not > released ? > > ================================================ > === Summary ====================================== > ================================================ > Case 1, reported by [~gestevez]: > * Flink 1.11.1 > * Java 11 > * Maximum Metaspace size set to 512mb > * Custom Batch job, submitted daily > * Requires restart every 15 days after an OOM > Case 2, reported by [~Echo Lee]: > * Flink 1.11.0 > * Java 11 > * G1GC > * WordCount Batch job, submitted every second / every 5 minutes > * eventually fails TaskExecutor with OOM > Case 3, reported by [~DaDaShen] > * Flink 1.11.0 > * Java 11 > * WordCount Batch job, submitted every 5 seconds > * growing Metaspace, eventually OOM > -- This message was sent by Atlassian Jira (v8.3.4#803005)