I've tried to remove all possible imports of classes not contained in the
fat jar but I still face the same problem.
I've also tried to reduce as much as possible the exclude in the shade
section of the maven plugin (I took the one at [1]) so now I exclude only
few dependencies..could it be that I should include org.slf4j:* if I use
static import of it?
<artifactSet>
<excludes>
<exclude>com.google.code.findbugs:jsr305</exclude>
<exclude>org.slf4j:*</exclude>
<exclude>log4j:*</exclude>
</excludes>
</artifactSet>
[1]
https://ci.apache.org/projects/flink/flink-docs-master/dev/project-configuration.html#appendix-template-for-building-a-jar-with-dependencies
On Mon, Nov 16, 2020 at 3:29 PM Jan Lukavský <[email protected]> wrote:
> Yes, that could definitely cause this. You should probably avoid using
> these flink-internal shaded classes and ship your own versions (not shaded).
>
> Best,
>
> Jan
> On 11/16/20 3:22 PM, Flavio Pompermaier wrote:
>
> Thank you Jan for your valuable feedback.
> Could it be that I should not use import shaded-jackson classes in my user
> code?
> For example import
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper?
>
> Bets,
> Flavio
>
> On Mon, Nov 16, 2020 at 3:15 PM Jan Lukavský <[email protected]> wrote:
>
>> Hi Flavio,
>>
>> when I encountered quite similar problem that you describe, it was
>> related to a static storage located in class that was loaded
>> "parent-first". In my case it was it was in java.lang.ClassValue, but it
>> might (and probably will be) different in your case. The problem is that if
>> user-code registers something in some (static) storage located in class
>> loaded with parent (TaskTracker) classloader, then its associated classes
>> will never be GC'd and Metaspace will grow. A good starting point would be
>> not to focus on biggest consumers of heap (in general), but to look at
>> where the 15k objects of type Class are referenced from. That might help
>> you figure this out. I'm not sure if there is something that can be done in
>> general to prevent this type of leaks. That would be probably question on
>> dev@ mailing list.
>>
>> Best,
>>
>> Jan
>> On 11/16/20 2:27 PM, Flavio Pompermaier wrote:
>>
>> Hello everybody,
>> I was writing this email when a similar thread on this mailing list
>> appeared..
>> The difference is that the other problem seems to be related with Flink
>> 1.10 on YARN and does not output anything helpful in debugging the cause of
>> the problem.
>>
>> Indeed, in my use case I use Flink 1.11.0 and Flink on a standalone
>> session cluster (the job is submitted to the cluster using the CLI client).
>> The problem arises when I submit the same job for about 20 times (this
>> number unfortunately is not deterministic and can change a little bit). The
>> error reported by the Task Executor is related to the ever growing
>> Metaspace..the error seems to be pretty detailed [1].
>>
>> I found the same issue in some previous threads on this mailing list and
>> I've tried to figure it out the cause of the problem. The issue is that
>> looking at the objects allocated I don't really get an idea of the source
>> of the problem because the type of objects that are consuming the memory
>> are of general purpose (i.e. Bytes, Integers and Strings)...these are my
>> "top" memory consumers if looking at the output of jmap -histo <PID>:
>>
>> At run 0:
>>
>> num #instances #bytes class name (module)
>> -------------------------------------------------------
>> 1: 46238 13224056 [B ([email protected])
>> 2: 3736 6536672 [I ([email protected])
>> 3: 38081 913944 java.lang.String ([email protected])
>> 4: 26 852384 [Lakka.dispatch.forkjoin.ForkJoinTask;
>> 5: 7146 844984 java.lang.Class ([email protected])
>>
>> At run 1:
>>
>> 1: 77.608 25.317.496 [B ([email protected])
>> 2: 7.004 9.088.360 [I ([email protected])
>> 3: 15.814 1.887.256 java.lang.Class (
>> [email protected])
>> 4: 67.381 1.617.144 java.lang.String (
>> [email protected])
>> 5: 3.906 1.422.960 [Ljava.util.HashMap$Node; (
>> [email protected])
>>
>> At run 6:
>>
>> 1: 81.408 25.375.400 [B ([email protected])
>> 2: 12.479 7.249.392 [I ([email protected])
>> 3: 29.090 3.496.168 java.lang.Class (
>> [email protected])
>> 4: 4.347 2.813.416 [Ljava.util.HashMap$Node; (
>> [email protected])
>> 5: 71.584 1.718.016 java.lang.String (
>> [email protected])
>>
>> At run 8:
>>
>> 1: 985.979 127.193.256 [B ([email protected])
>> 2: 35.400 13.702.112 [I ([email protected])
>> 3: 260.387 6.249.288 java.lang.String (
>> [email protected])
>> 4: 148.836 5.953.440 java.util.HashMap$KeyIterator (
>> [email protected])
>> 5: 17.641 5.222.344 [Ljava.util.HashMap$Node; (
>> [email protected])
>>
>> Thanks in advance for any help,
>> Flavio
>>
>> [1]
>> --------------------------------------------------------------------------------------------------
>> java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error
>> has occurred. This can mean two things: either the job requires a larger
>> size of JVM metaspace to load classes or there is a class loading leak. In
>> the first case 'taskmanager.memory.jvm-metaspace.size' configuration option
>> should be increased. If the error persists (usually in cluster after
>> several job (re-)submissions) then there is probably a class loading leak
>> in user code or some of its dependencies which has to be investigated and
>> fixed. The task executor has to be shutdown...
>> at java.lang.ClassLoader.defineClass1(Native Method) ~[?:?]
>> at java.lang.ClassLoader.defineClass(ClassLoader.java:1017) ~[?:?]
>> at
>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
>> ~[?:?]
>> at java.net.URLClassLoader.defineClass(URLClassLoader.java:550)
>> ~[?:?]
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:458) ~[?:?]
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:452) ~[?:?]
>> at java.security.AccessController.doPrivileged(Native Method)
>> ~[?:?]
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:451)
>> ~[?:?]
>> at
>> org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:71)
>> ~[flink-dist_2.12-1.11.0.jar:1.11.0]
>> at
>> org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:48)
>> [flink-dist_2.12-1.11.0.jar:1.11.0]
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:522) [?:?]
>>
>>