Hi Subramanyam, I think you do not need the fix in FLINK-10455 which is for Kafka only. It's just a similar issue as you met. As you said, we need to make sure that the operator/UDF spawned threads are stopped in the close() method. In this way, we can avoid the thread to throw NoClassDefFoundError due to the class loader gets closed.
Thanks, Zhu Zhu Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com> 于2019年9月24日周二 下午8:07写道: > Hi, > > > > Thank you. > > I think the takeaway for us is that we need to make sure that the threads > are stopped in the close() method. > > > > With regard to FLINK-10455, I see that the fix versions say : 1.5.6, > 1.7.0, 1.7.3, 1.8.1, 1.9.0 > > > > However, I’m unable to find 1.7.3 in the downloads page( > https://flink.apache.org/downloads.html). Is it yet to be released, or > perhaps I am not looking in the right place ? > > We’re currently using 1.7.2. Could you please let me know what is the > minimal upgrade for me to consume the fix for FLINK-10455 ? > > > > Thanks, > > Subbu > > > > *From:* Dian Fu [mailto:dian0511...@gmail.com] > *Sent:* Monday, September 23, 2019 1:54 PM > *To:* Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com> > *Cc:* Zhu Zhu <reed...@gmail.com>; user@flink.apache.org > *Subject:* Re: NoClassDefFoundError in failing-restarting job that uses > url classloader > > > > Hi Subbu, > > > > The issue you encountered is very similar to the issue which has been > fixed in FLINK-10455 [1]. Could you check if that fix could solve your > problem? The root cause for that issue is that the method close() has not > closed all things. After the method "close()" is called, the classloader > (URLClassloader) will be closed. If there is thread still running after > "close()" method is called, it may access the classes in user provided > jars. However, as the URLClassloader has already been closed, > NoClassDefFoundError will be thrown. > > > > Regards, > > Dian > > > > [1] https://issues.apache.org/jira/browse/FLINK-10455 > > > > 在 2019年9月23日,下午2:50,Subramanyam Ramanathan < > subramanyam.ramanat...@microfocus.com> 写道: > > > > Hi, > > > > I was able to simulate the issue again and understand the cause a little > better. > > > > The issue occurs when : > > - One of the RichMapFunction transformations uses a third party > library in the open() method that spawns a thread. > > - The thread doesn’t get properly closed in the close() method. > > - Once the job starts failing, we start seeing a NoClassDefFound > error from that thread. > > > > I understand that cleanup should be done in the close() method. However, > just wanted to know, do we have some kind of a configuration setting which > would help us clean up such threads ? > > I can attach the code if required. > > > > Thanks, > > Subbu > > > > *From:* Zhu Zhu [mailto:reed...@gmail.com <reed...@gmail.com>] > *Sent:* Friday, August 9, 2019 7:43 AM > *To:* Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com> > *Cc:* user@flink.apache.org > *Subject:* Re: NoClassDefFoundError in failing-restarting job that uses > url classloader > > > > Hi Subramanyam, > > > > Could you share more information? including: > > 1. the URL pattern > > 2. the detailed exception and the log around it > > 3. the cluster the job is running on, e.g. standalone, yarn, k8s > > 4. it's session mode or per job mode > > > > This information would be helpful to identify the failure cause. > > > > Thanks, > > Zhu Zhu > > > > > > > > > > > > > > > > > > > > > > > > Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com> 于2019年8月9 > 日周五 上午1:45写道: > > > > Hello, > > > > I'm currently using flink 1.7.2. > > > > I'm trying to run a job that's submitted programmatically using the > ClusterClient API. > > public JobSubmissionResult run(PackagedProgram prog, int > parallelism) > > > > > > The job makes use of some jars which I add to the packaged program through > the Packaged constructor, along with the Jar file. > > public PackagedProgram(File jarFile, List<URL> classpaths, String... > args) > > Normally, This works perfectly and the job runs fine. > > > > However, if there's an error in the job, and the job goes into failing > state and when it's continously trying to restart the job for an hour or > so, I notice a NoClassDefFoundError for some classes in the jars that I > load using the URL class loader and the job never recovers after that, even > if the root cause of the issue was fixed (I had a kafka source/sink in my > job, and kafka was down temporarily, and was brought up after that). > > The jar is still available at the path referenced by the url classloader > and is not tampered with. > > > > Could anyone please give me some pointers with regard to the reason why > this could happen/what I could be missing here/how can I debug further ? > > > > thanks > > Subbu > > >