I suddenly realised that I replied directly to Sudharsan. FYI if anyone wants 
to check this email too:



> On 2 Mar 2022, at 11:36 PM, yu'an huang <h.yuan...@gmail.com> wrote:
> 
> Hi Sudharsan,
> 
> I think you are right. I just tried your scenario. I set a standalone session 
> cluster with 2 task managers (with one slot only each) in Flink 1.11. Then I 
> submitted a job (parallelism 2), cancelled this job, submitted this job 
> second times, and then canceled the job again. After that, I used the 
> VisualVM to analyse the dumped memory of the task managers. 
> 
> What I observed is that there are two instances of ChildFirstClassLoader 
> (user class loader). The first one (belongs to the first job) do have a GC 
> root which is “Flink Netty Client (0) Thread 0” so it can not be GCed. The 
> second one (belongs to the second job) don’t have GC root so it will be GCed 
> in the future.
> 
> So it is interesting why the user class loader becomes the ContextClassLoader 
> of Flink Netty Client (0) Thread 0. After checking the code, in my 
> understanding:
> 1. When creating a thread, the context ClassLoader is provided by the creator 
> of the thread. If not set, the default is the ClassLoader context of the 
> parent Thread. 
> 2. A task is running in the “executingThread". Before starting the running 
> loop, the executingThread will set the userClassLoader to its 
> ContextClassLoader. Then the task call the invokable (Contains UDF etc).
> 3. Before the invokable enters the MailboxLoop, it needs to create the input 
> channel to receive data from upstream. If upstream is not local, it needs to 
> set up connection to remote thus a Flink Netty Client Thread may be created 
> which will have the executingThread’s ContextClassLoader as its 
> ContextClassLoader.
> 4. Even though this job is canceled, the  Flink Netty Client Thread still 
> exists because future job will need this thread. That is why the user class 
> loader can not be GCed until Flink Netty Client destroyed.
> 
> I feel that it might be very difficult to ensure no leak happened. If this 
> doesn't influence you job, it is probably okay to skip now. This issue may be 
> fixed or fix already (haven’t confirmed) in future Flink.
> 
> 
> Hopes this answer is not too late for you.
> 
> Best,
> Yuan
> 
> 
> 
> 
> 
> 
> 
>> On 1 Mar 2022, at 11:52 PM, Sudharsan R <sud.r...@gmail.com 
>> <mailto:sud.r...@gmail.com>> wrote:
>> 
>> Hello,
>> I'm running a flink 1.11.1 cluster. When I submit a job, it spawns a thread 
>> named "Flink Netty Client (0) Thread 0 Thread". It seems to be executing 
>> "org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap"
>> This thread associates as its context classloader a ChildFirstClassLoader 
>> (the one that is also the classloader for my app jar). 
>> Eventually, I cancel the job and this particular ChildFirstClassLoader 
>> remains with the GC root for this being the netty thread.
>> 
>> If I submit another job (same app jar as before) and cancel it, the 
>> ChildFirstClassLoader of this submission eventually gets GC'ed. However, the 
>> original one remains and seems to be leaked.
>> 
>> I don't see others complaining about this. And I don't do anything with this 
>> netty client directly! Any thoughts on what I can do?
>> 
>> Thanks
>> Sudharsan
>> 
>> 
> 

Reply via email to