I suddenly realised that I replied directly to Sudharsan. FYI if anyone wants to check this email too:
> On 2 Mar 2022, at 11:36 PM, yu'an huang <h.yuan...@gmail.com> wrote: > > Hi Sudharsan, > > I think you are right. I just tried your scenario. I set a standalone session > cluster with 2 task managers (with one slot only each) in Flink 1.11. Then I > submitted a job (parallelism 2), cancelled this job, submitted this job > second times, and then canceled the job again. After that, I used the > VisualVM to analyse the dumped memory of the task managers. > > What I observed is that there are two instances of ChildFirstClassLoader > (user class loader). The first one (belongs to the first job) do have a GC > root which is “Flink Netty Client (0) Thread 0” so it can not be GCed. The > second one (belongs to the second job) don’t have GC root so it will be GCed > in the future. > > So it is interesting why the user class loader becomes the ContextClassLoader > of Flink Netty Client (0) Thread 0. After checking the code, in my > understanding: > 1. When creating a thread, the context ClassLoader is provided by the creator > of the thread. If not set, the default is the ClassLoader context of the > parent Thread. > 2. A task is running in the “executingThread". Before starting the running > loop, the executingThread will set the userClassLoader to its > ContextClassLoader. Then the task call the invokable (Contains UDF etc). > 3. Before the invokable enters the MailboxLoop, it needs to create the input > channel to receive data from upstream. If upstream is not local, it needs to > set up connection to remote thus a Flink Netty Client Thread may be created > which will have the executingThread’s ContextClassLoader as its > ContextClassLoader. > 4. Even though this job is canceled, the Flink Netty Client Thread still > exists because future job will need this thread. That is why the user class > loader can not be GCed until Flink Netty Client destroyed. > > I feel that it might be very difficult to ensure no leak happened. If this > doesn't influence you job, it is probably okay to skip now. This issue may be > fixed or fix already (haven’t confirmed) in future Flink. > > > Hopes this answer is not too late for you. > > Best, > Yuan > > > > > > > >> On 1 Mar 2022, at 11:52 PM, Sudharsan R <sud.r...@gmail.com >> <mailto:sud.r...@gmail.com>> wrote: >> >> Hello, >> I'm running a flink 1.11.1 cluster. When I submit a job, it spawns a thread >> named "Flink Netty Client (0) Thread 0 Thread". It seems to be executing >> "org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap" >> This thread associates as its context classloader a ChildFirstClassLoader >> (the one that is also the classloader for my app jar). >> Eventually, I cancel the job and this particular ChildFirstClassLoader >> remains with the GC root for this being the netty thread. >> >> If I submit another job (same app jar as before) and cancel it, the >> ChildFirstClassLoader of this submission eventually gets GC'ed. However, the >> original one remains and seems to be leaked. >> >> I don't see others complaining about this. And I don't do anything with this >> netty client directly! Any thoughts on what I can do? >> >> Thanks >> Sudharsan >> >> >