Hi, could you somehow provide us a heap dump from a TM that run for a while (ideally, shortly before an OOME)? This would greatly help us to figure out if there is a classloader leak that causes the problem.
Best, Stefan > Am 29.11.2016 um 18:39 schrieb Konstantin Knauf > <konstantin.kn...@tngtech.com>: > > Hi everyone, > > since upgrading to Flink 1.1.3 we observe frequent OOME Permgen Taskmanager > Failures. Monitoring the permgen size on one of the Taskamanagers you can see > that each Job (New Job and Restarts) adds a few MB, which can not be > collected. Eventually, the OOME happens. This happens with all our Jobs, > Streaming and Batch, on Yarn 2.4 as well as Stand-Alone. > > On Flink 1.0.2 this was not a problem, but I will investigate it further. > > The assumption is that Flink is somehow using one of the classes, which comes > with our jar and by that prevents the gc of the whole class loader. Our Jars > do not include any flink dependencies though (compileOnly), but of course > many others. > > Any ideas anyone? > > Cheers and thank you, > > Konstantin > > sent from my phone. Plz excuse brevity and tpyos. > --- > Konstantin Knauf *konstantin.kn...@tngtech.com * +49-174-3413182 > TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring > Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke