Re: Flink 1.1.3 OOME Permgen

2016-12-05 Thread Konstantin Knauf
Yep, I would suppose so. You need to have the reference from the AppClassLoader to the UserCodeClassLoader. On 05.12.2016 12:37, Robert Metzger wrote: > I executed this snipped in each Flink job: > > @Override > public void open(Configuration config) { > ObjectMapper somethingWithJackson = new

Re: Flink 1.1.3 OOME Permgen

2016-12-05 Thread Robert Metzger
I executed this snipped in each Flink job: @Override public void open(Configuration config) { ObjectMapper somethingWithJackson = new ObjectMapper(); try { ObjectNode on = somethingWithJackson.readValue("{\"a\": \"b\"}", ObjectNode.class); } catch (IOException e) { throw new RuntimeE

Re: Flink 1.1.3 OOME Permgen

2016-12-05 Thread Konstantin Knauf
Hi Robert, you need to actually use Jackson. The problematic field is a cache, which is filled by all classes, which were serialized/deserialized by Jackson. Best, Konstantin On 05.12.2016 11:55, Robert Metzger wrote: > I've submitted Wordcount 410 times to a testing cluster and a streaming > j

Re: Flink 1.1.3 OOME Permgen

2016-12-05 Thread Robert Metzger
I've submitted Wordcount 410 times to a testing cluster and a streaming job 290 times and I could not reproduce the issue with 1.1.3. Also, the heapdump of one of the TaskManagers looked pretty normal. Do you have any ideas how to reproduce the issue? On Fri, Dec 2, 2016 at 3:21 PM, Robert Metzge

Re: Flink 1.1.3 OOME Permgen

2016-12-02 Thread Robert Metzger
Thank you for reporting the issue Konstantin. I've filed a JIRA for the jackson issue: https://issues.apache.org/jira/browse/FLINK-5233. As I said in the JIRA, I propose to upgrade to Jackson 2.7.8, as this version contains the fix for the issue, but its not a major jackson upgrade. Any chance you

Re: Flink 1.1.3 OOME Permgen

2016-12-02 Thread Fabian Hueske
Hi Konstantin, Regarding 2): I've opened FLINK-5227 to update the documentation [1]. Regarding the Row type: The Row type was introduced for flink-table and was later used by other modules. There is FLINK-5186 to move Row and all the related TypeInfo (+serializer and comparator) to flink-core [2]

Re: Flink 1.1.3 OOME Permgen

2016-11-30 Thread Konstantin Knauf
Hi Stefan, unfortunately, I can not share any heap dumps with you. I was able to resolve some of the issues my self today, the root causes were different for different jobs. 1) Jackson 2.7.2 (which comes with Flink) has a known class loading issue (see https://github.com/FasterXML/jackson-databin

Re: Flink 1.1.3 OOME Permgen

2016-11-29 Thread Stefan Richter
Hi, could you somehow provide us a heap dump from a TM that run for a while (ideally, shortly before an OOME)? This would greatly help us to figure out if there is a classloader leak that causes the problem. Best, Stefan > Am 29.11.2016 um 18:39 schrieb Konstantin Knauf > : > > Hi everyone,

Flink 1.1.3 OOME Permgen

2016-11-29 Thread Konstantin Knauf
Hi everyone, since upgrading to Flink 1.1.3 we observe frequent OOME Permgen Taskmanager Failures. Monitoring the permgen size on one of the Taskamanagers you can see that each Job (New Job and Restarts) adds a few MB, which can not be collected. Eventually, the OOME happens. This happens with