Hi everyone, since upgrading to Flink 1.1.3 we observe frequent OOME Permgen Taskmanager Failures. Monitoring the permgen size on one of the Taskamanagers you can see that each Job (New Job and Restarts) adds a few MB, which can not be collected. Eventually, the OOME happens. This happens with all our Jobs, Streaming and Batch, on Yarn 2.4 as well as Stand-Alone.
On Flink 1.0.2 this was not a problem, but I will investigate it further. The assumption is that Flink is somehow using one of the classes, which comes with our jar and by that prevents the gc of the whole class loader. Our Jars do not include any flink dependencies though (compileOnly), but of course many others. Any ideas anyone? Cheers and thank you, Konstantin sent from my phone. Plz excuse brevity and tpyos. --- Konstantin Knauf *konstantin.kn...@tngtech.com * +49-174-3413182 TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke