Hi Subramanyam, I checked the commits. There are 2 fixes in FLINK-10455, only release 1.8.1 and release 1.9.0 contain both of them.
Thanks, Zhu Zhu Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com> 于2019年9月24日周二 下午11:02写道: > Hi Zhu, > > > > We also use FlinkKafkaProducer(011), hence I felt this fix would also be > needed for us. > > > > I agree that the fix for the issue I had originally mentioned would not be > fixed by this, but I felt that I should be consuming this fix also. > > > > Thanks, > > Subbu > > > > *From:* Zhu Zhu [mailto:reed...@gmail.com] > *Sent:* Tuesday, September 24, 2019 6:13 PM > *To:* Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com> > *Cc:* Dian Fu <dian0511...@gmail.com>; user@flink.apache.org > *Subject:* Re: NoClassDefFoundError in failing-restarting job that uses > url classloader > > > > Hi Subramanyam, > > > > I think you do not need the fix in FLINK-10455 which is for Kafka only. > It's just a similar issue as you met. > > As you said, we need to make sure that the operator/UDF spawned threads > are stopped in the close() method. In this way, we can avoid the thread to > throw NoClassDefFoundError due to the class loader gets closed. > > > > Thanks, > > Zhu Zhu > > > > > > Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com> 于2019年9月24 > 日周二 下午8:07写道: > > Hi, > > > > Thank you. > > I think the takeaway for us is that we need to make sure that the threads > are stopped in the close() method. > > > > With regard to FLINK-10455, I see that the fix versions say : 1.5.6, > 1.7.0, 1.7.3, 1.8.1, 1.9.0 > > > > However, I’m unable to find 1.7.3 in the downloads page( > https://flink.apache.org/downloads.html). Is it yet to be released, or > perhaps I am not looking in the right place ? > > We’re currently using 1.7.2. Could you please let me know what is the > minimal upgrade for me to consume the fix for FLINK-10455 ? > > > > Thanks, > > Subbu > > > > *From:* Dian Fu [mailto:dian0511...@gmail.com] > *Sent:* Monday, September 23, 2019 1:54 PM > *To:* Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com> > *Cc:* Zhu Zhu <reed...@gmail.com>; user@flink.apache.org > *Subject:* Re: NoClassDefFoundError in failing-restarting job that uses > url classloader > > > > Hi Subbu, > > > > The issue you encountered is very similar to the issue which has been > fixed in FLINK-10455 [1]. Could you check if that fix could solve your > problem? The root cause for that issue is that the method close() has not > closed all things. After the method "close()" is called, the classloader > (URLClassloader) will be closed. If there is thread still running after > "close()" method is called, it may access the classes in user provided > jars. However, as the URLClassloader has already been closed, > NoClassDefFoundError will be thrown. > > > > Regards, > > Dian > > > > [1] https://issues.apache.org/jira/browse/FLINK-10455 > > > > 在 2019年9月23日,下午2:50,Subramanyam Ramanathan < > subramanyam.ramanat...@microfocus.com> 写道: > > > > Hi, > > > > I was able to simulate the issue again and understand the cause a little > better. > > > > The issue occurs when : > > - One of the RichMapFunction transformations uses a third party > library in the open() method that spawns a thread. > > - The thread doesn’t get properly closed in the close() method. > > - Once the job starts failing, we start seeing a NoClassDefFound > error from that thread. > > > > I understand that cleanup should be done in the close() method. However, > just wanted to know, do we have some kind of a configuration setting which > would help us clean up such threads ? > > I can attach the code if required. > > > > Thanks, > > Subbu > > > > *From:* Zhu Zhu [mailto:reed...@gmail.com <reed...@gmail.com>] > *Sent:* Friday, August 9, 2019 7:43 AM > *To:* Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com> > *Cc:* user@flink.apache.org > *Subject:* Re: NoClassDefFoundError in failing-restarting job that uses > url classloader > > > > Hi Subramanyam, > > > > Could you share more information? including: > > 1. the URL pattern > > 2. the detailed exception and the log around it > > 3. the cluster the job is running on, e.g. standalone, yarn, k8s > > 4. it's session mode or per job mode > > > > This information would be helpful to identify the failure cause. > > > > Thanks, > > Zhu Zhu > > > > > > > > > > > > > > > > > > > > > > > > Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com> 于2019年8月9 > 日周五 上午1:45写道: > > > > Hello, > > > > I'm currently using flink 1.7.2. > > > > I'm trying to run a job that's submitted programmatically using the > ClusterClient API. > > public JobSubmissionResult run(PackagedProgram prog, int > parallelism) > > > > > > The job makes use of some jars which I add to the packaged program through > the Packaged constructor, along with the Jar file. > > public PackagedProgram(File jarFile, List<URL> classpaths, String... > args) > > Normally, This works perfectly and the job runs fine. > > > > However, if there's an error in the job, and the job goes into failing > state and when it's continously trying to restart the job for an hour or > so, I notice a NoClassDefFoundError for some classes in the jars that I > load using the URL class loader and the job never recovers after that, even > if the root cause of the issue was fixed (I had a kafka source/sink in my > job, and kafka was down temporarily, and was brought up after that). > > The jar is still available at the path referenced by the url classloader > and is not tampered with. > > > > Could anyone please give me some pointers with regard to the reason why > this could happen/what I could be missing here/how can I debug further ? > > > > thanks > > Subbu > > > >