Hi All, Recently, FLIP-49 [1] introduced the new JVM Metaspace limit in the 1.10 release [2]. Flink scripts, which start the task manager JVM process, set this limit by adding the corresponding JVM argument. This has been done to properly plan resources. especially to derive container size for Yarn/Mesos/Kubernetes. Also, it should surface potential class loading leaks. There is an option to change it: 'taskmanager.memory.jvm-metaspace.size' [3]. Its current default value is 96Mb.
This change led to 'OutOfMemoryError: Metaspace' in certain cases after upgrading to 1.10 version. In some cases, a class loading leak has been detected [4] and has to be investigated on its own. In other cases, just increasing the option value helped because the default value was not enough, presumably, due to the job specifics. In general, the required Metaspace size depends on the job and there is no default value to cover all cases. There is an issue to improve docs for this concern [5]. This survey is to come up with the most reasonable default value for this option. If you have encountered this issue and increasing the Metaspace size helped (there is no class loading leak), please, report any specifics of your job, if you think it is relevant for this concern, and the option value that resolved it. There is also a dedicated Jira issue [6] for reporting. Thanks, Andrey [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors [2] https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#jvm-parameters [3] https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/config.html#taskmanager-memory-jvm-metaspace-size [4] https://issues.apache.org/jira/browse/FLINK-16142 [5] https://issues.apache.org/jira/browse/FLINK-16278 [6] https://jira.apache.org/jira/browse/FLINK-16406