[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183182#comment-17183182 ]
Chesnay Schepler commented on FLINK-19005: ------------------------------------------ I've added a summary to the Jira description. [~Echo Lee] [~DaDaShen] [~gestevez] Could you tell us exactly which Java 11 version you are using? (output of {{java -version}}) So far I was not really able to reproduce the issue. I'm submitting the Batch Wordcount example in a loop, such that the next one is submitted once the previous one finishes. I do see the Metaspace going up, but once it gets close to the Metaspace size maximum the GC kicks in. My current run has been going for about an hour and ran roughly 1500 jobs, and I can see several dips in the metaspace usage. In one instance I did get a TaskExecutor crash of sorts, but increasing the Metaspace by as little as 2mb fixed this issue. I do not consider this a successful reproduction, as I've conducted the tests initially with a very low max size of 40mb, and it is somewhat expected that things may fail when it is configured to such a low value. > used metaspace grow on every execution > -------------------------------------- > > Key: FLINK-19005 > URL: https://issues.apache.org/jira/browse/FLINK-19005 > Project: Flink > Issue Type: Bug > Components: API / DataSet, Client / Job Submission > Affects Versions: 1.11.1 > Reporter: Guillermo Sánchez > Assignee: Chesnay Schepler > Priority: Major > Attachments: heap_dump_after_10_executions.zip, > heap_dump_after_1_execution.zip > > > Hi ! > Im running a 1.11.1 flink cluster, where I execute batch jobs made with > DataSet API. > I submit these jobs every day to calculate daily data. > In every execution, cluster's used metaspace increase by 7MB and its never > released. > This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i > need to restart the cluster to clean the metaspace > taskmanager.memory.jvm-metaspace.size is set to 512mb > Any idea of what could be causing this metaspace grow and why is it not > released ? > > ================================================ > === Summary ====================================== > ================================================ > Case 1, reported by [~gestevez]: > * Flink 1.11.1 > * Java 11 > * Maximum Metaspace size set to 512mb > * Custom Batch job, submitted daily > * Requires restart every 15 days after an OOM > Case 2, reported by [~Echo Lee]: > * Flink 1.11.0 > * Java 11 > * G1GC > * WordCount Batch job, submitted every second / every 5 minutes > * eventually fails TaskExecutor with OOM > Case 3, reported by [~DaDaShen] > * Flink 1.11.0 > * Java 11 > * WordCount Batch job, submitted every 5 seconds > * growing Metaspace, eventually OOM > -- This message was sent by Atlassian Jira (v8.3.4#803005)