Hi Robert, you need to actually use Jackson. The problematic field is a cache, which is filled by all classes, which were serialized/deserialized by Jackson.
Best, Konstantin On 05.12.2016 11:55, Robert Metzger wrote: > I've submitted Wordcount 410 times to a testing cluster and a streaming > job 290 times and I could not reproduce the issue with 1.1.3. Also, the > heapdump of one of the TaskManagers looked pretty normal. > > Do you have any ideas how to reproduce the issue? > > On Fri, Dec 2, 2016 at 3:21 PM, Robert Metzger <rmetz...@apache.org > <mailto:rmetz...@apache.org>> wrote: > > Thank you for reporting the issue Konstantin. > I've filed a JIRA for the jackson > issue: https://issues.apache.org/jira/browse/FLINK-5233 > <https://issues.apache.org/jira/browse/FLINK-5233>. > As I said in the JIRA, I propose to upgrade to Jackson 2.7.8, as > this version contains the fix for the issue, but its not a major > jackson upgrade. > > Any chance you could try to if 2.7.8 fixes the issue as well? > > > On Fri, Dec 2, 2016 at 11:12 AM, Fabian Hueske <fhue...@gmail.com > <mailto:fhue...@gmail.com>> wrote: > > Hi Konstantin, > > Regarding 2): I've opened FLINK-5227 to update the documentation > [1]. > > Regarding the Row type: The Row type was introduced for > flink-table and was later used by other modules. There is > FLINK-5186 to move Row and all the related TypeInfo (+serializer > and comparator) to flink-core [2]. That should solve your issue. > > Some of the connector modules which provide TableSource and > TableSinks have dependencies on flink-table as well. I'll check > that these are optional dependencies to avoid that we pull in > Calcite through connectors for jobs that do not not need it. > > Thanks, > Fabian > > [1] https://issues.apache.org/jira/browse/FLINK-5227 > <https://issues.apache.org/jira/browse/FLINK-5227> > [2] https://issues.apache.org/jira/browse/FLINK-5186 > <https://issues.apache.org/jira/browse/FLINK-5186> > > 2016-11-30 17:51 GMT+01:00 Konstantin Knauf > <konstantin.kn...@tngtech.com > <mailto:konstantin.kn...@tngtech.com>>: > > Hi Stefan, > > unfortunately, I can not share any heap dumps with you. I > was able to > resolve some of the issues my self today, the root causes > were different > for different jobs. > > 1) Jackson 2.7.2 (which comes with Flink) has a known class > loading > issue (see > https://github.com/FasterXML/jackson-databind/issues/1363 > <https://github.com/FasterXML/jackson-databind/issues/1363>). > Shipping a shaded version of Jackson 2.8.4 with our user > code helped. I > recommend upgrading Flink's Jackson version soon. > > 2) We have a dependency on the flink-table [1] , which ships > with > Calcite including the Calcite JDBC Driver, which can not > been collected > cause of the known problem with the java.sql.DriverManager. > Putting the > flink-table in Flink's lib dir instead of shipping it with > the user code > helps. You should update the documentation, because this > will always > happen when using flink-table, I think. So I wonder, why > this has not > come up before actually. > > 3) Unresolved: Some Threads in a custom source which are not > proberly > shut down and keep references to the UserCodeClassLoader. I > did not have > time to look into this issue so far. > > Cheers, > > Konstantin > > [1] Side note: We only need flink-table for the "Row" class > used in the > JdbcOutputFormat, so it might make sense to move this class > somewhere > else. Naturally, we also tried to exclude the "transitive" > dependency on > org.apache.calcite until we noticed that calcite is packaged > with > flink-table, so that you can not even exclude it. What is > the reasons > for this? > > > > > On 30.11.2016 00:55, Stefan Richter wrote: > > Hi, > > > > could you somehow provide us a heap dump from a TM that > run for a while (ideally, shortly before an OOME)? This > would greatly help us to figure out if there is a > classloader leak that causes the problem. > > > > Best, > > Stefan > > > >> Am 29.11.2016 um 18:39 schrieb Konstantin Knauf > <konstantin.kn...@tngtech.com > <mailto:konstantin.kn...@tngtech.com>>: > >> > >> Hi everyone, > >> > >> since upgrading to Flink 1.1.3 we observe frequent OOME > Permgen Taskmanager Failures. Monitoring the permgen size on > one of the Taskamanagers you can see that each Job (New Job > and Restarts) adds a few MB, which can not be collected. > Eventually, the OOME happens. This happens with all our > Jobs, Streaming and Batch, on Yarn 2.4 as well as Stand-Alone. > >> > >> On Flink 1.0.2 this was not a problem, but I will > investigate it further. > >> > >> The assumption is that Flink is somehow using one of the > classes, which comes with our jar and by that prevents the > gc of the whole class loader. Our Jars do not include any > flink dependencies though (compileOnly), but of course many > others. > >> > >> Any ideas anyone? > >> > >> Cheers and thank you, > >> > >> Konstantin > >> > >> sent from my phone. Plz excuse brevity and tpyos. > >> --- > >> Konstantin Knauf *konstantin.kn...@tngtech.com > <mailto:konstantin.kn...@tngtech.com> * +49-174-3413182 > <tel:%2B49-174-3413182> > >> TNG Technology Consulting GmbH, Betastr. 13a, 85774 > Unterföhring > >> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. > Robert Dahlke > > > > > > -- > Konstantin Knauf * konstantin.kn...@tngtech.com > <mailto:konstantin.kn...@tngtech.com> * +49-174-3413182 > <tel:%2B49-174-3413182> > TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring > Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert > Dahlke > Sitz: Unterföhring * Amtsgericht München * HRB 135082 > > > > -- Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182 TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke Sitz: Unterföhring * Amtsgericht München * HRB 135082
signature.asc
Description: OpenPGP digital signature