Hi Subramanyam, 1.7.3 is not released yet. You need cherrypick these fixes if they really need them.
Regards, Dian > 在 2019年9月25日,上午12:08,Zhu Zhu <reed...@gmail.com> 写道: > > Hi Subramanyam, > > I checked the commits. > There are 2 fixes in FLINK-10455, only release 1.8.1 and release 1.9.0 > contain both of them. > > Thanks, > Zhu Zhu > > Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com > <mailto:subramanyam.ramanat...@microfocus.com>> 于2019年9月24日周二 下午11:02写道: > Hi Zhu, > > > > We also use FlinkKafkaProducer(011), hence I felt this fix would also be > needed for us. > > > > I agree that the fix for the issue I had originally mentioned would not be > fixed by this, but I felt that I should be consuming this fix also. > > > > Thanks, > > Subbu > > > > From: Zhu Zhu [mailto:reed...@gmail.com <mailto:reed...@gmail.com>] > Sent: Tuesday, September 24, 2019 6:13 PM > To: Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com > <mailto:subramanyam.ramanat...@microfocus.com>> > Cc: Dian Fu <dian0511...@gmail.com <mailto:dian0511...@gmail.com>>; > user@flink.apache.org <mailto:user@flink.apache.org> > Subject: Re: NoClassDefFoundError in failing-restarting job that uses url > classloader > > > > Hi Subramanyam, > > > > I think you do not need the fix in FLINK-10455 which is for Kafka only. It's > just a similar issue as you met. > > As you said, we need to make sure that the operator/UDF spawned threads are > stopped in the close() method. In this way, we can avoid the thread to throw > NoClassDefFoundError due to the class loader gets closed. > > > > Thanks, > > Zhu Zhu > > > > > > Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com > <mailto:subramanyam.ramanat...@microfocus.com>> 于2019年9月24日周二 下午8:07写道: > > Hi, > > > > Thank you. > > I think the takeaway for us is that we need to make sure that the threads are > stopped in the close() method. > > > > With regard to FLINK-10455, I see that the fix versions say : 1.5.6, 1.7.0, > 1.7.3, 1.8.1, 1.9.0 > > > > However, I’m unable to find 1.7.3 in the downloads > page(https://flink.apache.org/downloads.html > <https://flink.apache.org/downloads.html>). Is it yet to be released, or > perhaps I am not looking in the right place ? > > We’re currently using 1.7.2. Could you please let me know what is the minimal > upgrade for me to consume the fix for FLINK-10455 ? > > > > Thanks, > > Subbu > > > > From: Dian Fu [mailto:dian0511...@gmail.com <mailto:dian0511...@gmail.com>] > Sent: Monday, September 23, 2019 1:54 PM > To: Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com > <mailto:subramanyam.ramanat...@microfocus.com>> > Cc: Zhu Zhu <reed...@gmail.com <mailto:reed...@gmail.com>>; > user@flink.apache.org <mailto:user@flink.apache.org> > Subject: Re: NoClassDefFoundError in failing-restarting job that uses url > classloader > > > > Hi Subbu, > > > > The issue you encountered is very similar to the issue which has been fixed > in FLINK-10455 [1]. Could you check if that fix could solve your problem? The > root cause for that issue is that the method close() has not closed all > things. After the method "close()" is called, the classloader > (URLClassloader) will be closed. If there is thread still running after > "close()" method is called, it may access the classes in user provided jars. > However, as the URLClassloader has already been closed, NoClassDefFoundError > will be thrown. > > > > Regards, > > Dian > > > > [1] https://issues.apache.org/jira/browse/FLINK-10455 > <https://issues.apache.org/jira/browse/FLINK-10455> > > > 在 2019年9月23日,下午2:50,Subramanyam Ramanathan > <subramanyam.ramanat...@microfocus.com > <mailto:subramanyam.ramanat...@microfocus.com>> 写道: > > > > Hi, > > > > I was able to simulate the issue again and understand the cause a little > better. > > > > The issue occurs when : > > - One of the RichMapFunction transformations uses a third party > library in the open() method that spawns a thread. > > - The thread doesn’t get properly closed in the close() method. > > - Once the job starts failing, we start seeing a NoClassDefFound error > from that thread. > > > > I understand that cleanup should be done in the close() method. However, just > wanted to know, do we have some kind of a configuration setting which would > help us clean up such threads ? > > I can attach the code if required. > > > > Thanks, > > Subbu > > > > From: Zhu Zhu [mailto:reed...@gmail.com <mailto:reed...@gmail.com>] > Sent: Friday, August 9, 2019 7:43 AM > To: Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com > <mailto:subramanyam.ramanat...@microfocus.com>> > Cc: user@flink.apache.org <mailto:user@flink.apache.org> > Subject: Re: NoClassDefFoundError in failing-restarting job that uses url > classloader > > > > Hi Subramanyam, > > > > Could you share more information? including: > > 1. the URL pattern > > 2. the detailed exception and the log around it > > 3. the cluster the job is running on, e.g. standalone, yarn, k8s > > 4. it's session mode or per job mode > > > > This information would be helpful to identify the failure cause. > > > > Thanks, > > Zhu Zhu > > > > > > > > > > > > > > > > > > > > > > > > Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com > <mailto:subramanyam.ramanat...@microfocus.com>> 于2019年8月9日周五 上午1:45写道: > > > > Hello, > > > > I'm currently using flink 1.7.2. > > > > I'm trying to run a job that's submitted programmatically using the > ClusterClient API. > > public JobSubmissionResult run(PackagedProgram prog, int > parallelism) > > > > > > The job makes use of some jars which I add to the packaged program through > the Packaged constructor, along with the Jar file. > > public PackagedProgram(File jarFile, List<URL> classpaths, String... args) > > Normally, This works perfectly and the job runs fine. > > > > However, if there's an error in the job, and the job goes into failing state > and when it's continously trying to restart the job for an hour or so, I > notice a NoClassDefFoundError for some classes in the jars that I load using > the URL class loader and the job never recovers after that, even if the root > cause of the issue was fixed (I had a kafka source/sink in my job, and kafka > was down temporarily, and was brought up after that). > > The jar is still available at the path referenced by the url classloader and > is not tampered with. > > > > Could anyone please give me some pointers with regard to the reason why this > could happen/what I could be missing here/how can I debug further ? > > > > thanks > > Subbu > > >