Re: Random Task executor shutdown (java.lang.OutOfMemoryError: Metaspace)

Jan Lukavský Mon, 16 Nov 2020 06:15:23 -0800

Hi Flavio,

when I encountered quite similar problem that you describe, it wasrelated to a static storage located in class that was loaded"parent-first". In my case it was it was in java.lang.ClassValue, but itmight (and probably will be) different in your case. The problem is thatif user-code registers something in some (static) storage located inclass loaded with parent (TaskTracker) classloader, then its associatedclasses will never be GC'd and Metaspace will grow. A good startingpoint would be not to focus on biggest consumers of heap (in general),but to look at where the 15k objects of type Class are referenced from.That might help you figure this out. I'm not sure if there is somethingthat can be done in general to prevent this type of leaks. That would beprobably question on dev@ mailing list.


Best,

 Jan

On 11/16/20 2:27 PM, Flavio Pompermaier wrote:

Hello everybody,
I was writing this email when a similar thread on this mailing listappeared..The difference is that the other problem seems to be relatedwith Flink 1.10 on YARN and does not output anything helpful indebugging the cause of the problem.
Indeed, in my use case I use Flink 1.11.0 and Flink on a standalonesession cluster (the job is submitted to the cluster using the CLIclient).The problem arises when I submit the same job for about 20 times (thisnumber unfortunately is not deterministic and can change a littlebit). The error reported by the Task Executor is related to the evergrowing Metaspace..the error seems to be pretty detailed [1].
I found the same issue in some previous threads on this mailing listand I've tried to figure it out the cause of the problem. The issue isthat looking at the objects allocated I don't really get an idea ofthe source of the problem because the type of objects that areconsuming the memory are of general purpose (i.e. Bytes, Integers andStrings)...these are my "top" memory consumers if looking at theoutput of jmap -histo <PID>:
At run 0:

 num     #instances         #bytes  class name (module)
-------------------------------------------------------
1: 46238 13224056 [B (java.base@11.0.9.1<mailto:java.base@11.0.9.1>) 2: 3736 6536672 [I (java.base@11.0.9.1<mailto:java.base@11.0.9.1>) 3: 38081 913944 java.lang.String(java.base@11.0.9.1 <mailto:java.base@11.0.9.1>)
   4:            26         852384  [Lakka.dispatch.forkjoin.ForkJoinTask;
5: 7146 844984 java.lang.Class(java.base@11.0.9.1 <mailto:java.base@11.0.9.1>)
At run 1:
1: 77.608 25.317.496 [B (java.base@11.0.9.1<mailto:java.base@11.0.9.1>) 2: 7.004 9.088.360 [I (java.base@11.0.9.1<mailto:java.base@11.0.9.1>) 3: 15.814 1.887.256 java.lang.Class(java.base@11.0.9.1 <mailto:java.base@11.0.9.1>) 4: 67.381 1.617.144 java.lang.String(java.base@11.0.9.1 <mailto:java.base@11.0.9.1>) 5: 3.906 1.422.960 [Ljava.util.HashMap$Node;(java.base@11.0.9.1 <mailto:java.base@11.0.9.1>)
At run 6:
1: 81.408 25.375.400 [B (java.base@11.0.9.1<mailto:java.base@11.0.9.1>) 2: 12.479 7.249.392 [I (java.base@11.0.9.1<mailto:java.base@11.0.9.1>) 3: 29.090 3.496.168 java.lang.Class(java.base@11.0.9.1 <mailto:java.base@11.0.9.1>) 4: 4.347 2.813.416 [Ljava.util.HashMap$Node;(java.base@11.0.9.1 <mailto:java.base@11.0.9.1>) 5: 71.584 1.718.016 java.lang.String(java.base@11.0.9.1 <mailto:java.base@11.0.9.1>)
At run 8:
1: 985.979 127.193.256 [B (java.base@11.0.9.1<mailto:java.base@11.0.9.1>) 2: 35.400 13.702.112 [I (java.base@11.0.9.1<mailto:java.base@11.0.9.1>) 3: 260.387 6.249.288 java.lang.String(java.base@11.0.9.1 <mailto:java.base@11.0.9.1>) 4: 148.836 5.953.440 java.util.HashMap$KeyIterator(java.base@11.0.9.1 <mailto:java.base@11.0.9.1>) 5: 17.641 5.222.344 [Ljava.util.HashMap$Node;(java.base@11.0.9.1 <mailto:java.base@11.0.9.1>)
Thanks in advance for any help,
Flavio
[1]--------------------------------------------------------------------------------------------------java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memoryerror has occurred. This can mean two things: either the job requiresa larger size of JVM metaspace to load classes or there is a classloading leak. In the first case'taskmanager.memory.jvm-metaspace.size' configuration option should beincreased. If the error persists (usually in cluster after several job(re-)submissions) then there is probably a class loading leak in usercode or some of its dependencies which has to be investigated andfixed. The task executor has to be shutdown...
        at java.lang.ClassLoader.defineClass1(Native Method) ~[?:?]
        at java.lang.ClassLoader.defineClass(ClassLoader.java:1017) ~[?:?]
atjava.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)~[?:?] atjava.net.URLClassLoader.defineClass(URLClassLoader.java:550) ~[?:?]
        at java.net.URLClassLoader$1.run(URLClassLoader.java:458) ~[?:?]
        at java.net.URLClassLoader$1.run(URLClassLoader.java:452) ~[?:?]
at java.security.AccessController.doPrivileged(Native Method)~[?:?] at java.net.URLClassLoader.findClass(URLClassLoader.java:451)~[?:?] atorg.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:71)~[flink-dist_2.12-1.11.0.jar:1.11.0] atorg.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:48)[flink-dist_2.12-1.11.0.jar:1.11.0]
        at java.lang.ClassLoader.loadClass(ClassLoader.java:522) [?:?]

Re: Random Task executor shutdown (java.lang.OutOfMemoryError: Metaspace)

Reply via email to