Hi Konstantin, Regarding 2): I've opened FLINK-5227 to update the documentation [1].
Regarding the Row type: The Row type was introduced for flink-table and was later used by other modules. There is FLINK-5186 to move Row and all the related TypeInfo (+serializer and comparator) to flink-core [2]. That should solve your issue. Some of the connector modules which provide TableSource and TableSinks have dependencies on flink-table as well. I'll check that these are optional dependencies to avoid that we pull in Calcite through connectors for jobs that do not not need it. Thanks, Fabian [1] https://issues.apache.org/jira/browse/FLINK-5227 [2] https://issues.apache.org/jira/browse/FLINK-5186 2016-11-30 17:51 GMT+01:00 Konstantin Knauf <konstantin.kn...@tngtech.com>: > Hi Stefan, > > unfortunately, I can not share any heap dumps with you. I was able to > resolve some of the issues my self today, the root causes were different > for different jobs. > > 1) Jackson 2.7.2 (which comes with Flink) has a known class loading > issue (see https://github.com/FasterXML/jackson-databind/issues/1363). > Shipping a shaded version of Jackson 2.8.4 with our user code helped. I > recommend upgrading Flink's Jackson version soon. > > 2) We have a dependency on the flink-table [1] , which ships with > Calcite including the Calcite JDBC Driver, which can not been collected > cause of the known problem with the java.sql.DriverManager. Putting the > flink-table in Flink's lib dir instead of shipping it with the user code > helps. You should update the documentation, because this will always > happen when using flink-table, I think. So I wonder, why this has not > come up before actually. > > 3) Unresolved: Some Threads in a custom source which are not proberly > shut down and keep references to the UserCodeClassLoader. I did not have > time to look into this issue so far. > > Cheers, > > Konstantin > > [1] Side note: We only need flink-table for the "Row" class used in the > JdbcOutputFormat, so it might make sense to move this class somewhere > else. Naturally, we also tried to exclude the "transitive" dependency on > org.apache.calcite until we noticed that calcite is packaged with > flink-table, so that you can not even exclude it. What is the reasons > for this? > > > > > On 30.11.2016 00:55, Stefan Richter wrote: > > Hi, > > > > could you somehow provide us a heap dump from a TM that run for a while > (ideally, shortly before an OOME)? This would greatly help us to figure out > if there is a classloader leak that causes the problem. > > > > Best, > > Stefan > > > >> Am 29.11.2016 um 18:39 schrieb Konstantin Knauf < > konstantin.kn...@tngtech.com>: > >> > >> Hi everyone, > >> > >> since upgrading to Flink 1.1.3 we observe frequent OOME Permgen > Taskmanager Failures. Monitoring the permgen size on one of the > Taskamanagers you can see that each Job (New Job and Restarts) adds a few > MB, which can not be collected. Eventually, the OOME happens. This happens > with all our Jobs, Streaming and Batch, on Yarn 2.4 as well as Stand-Alone. > >> > >> On Flink 1.0.2 this was not a problem, but I will investigate it > further. > >> > >> The assumption is that Flink is somehow using one of the classes, which > comes with our jar and by that prevents the gc of the whole class loader. > Our Jars do not include any flink dependencies though (compileOnly), but of > course many others. > >> > >> Any ideas anyone? > >> > >> Cheers and thank you, > >> > >> Konstantin > >> > >> sent from my phone. Plz excuse brevity and tpyos. > >> --- > >> Konstantin Knauf *konstantin.kn...@tngtech.com * +49-174-3413182 > >> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring > >> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke > > > > > > -- > Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182 > TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring > Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke > Sitz: Unterföhring * Amtsgericht München * HRB 135082 > >