RE: OutOfMemory when looping on dataset filter

2016-12-09 Thread LINZ, Arnaud
: Re: OutOfMemory when looping on dataset filter Hi Arnaud! I assume you are using either a standalone setup, or a YARN session? This looks to me as if classes cannot be properly garbage collected. Since each job (each day is executed as a separate job), loads the classes again, the PermGen space runs

Re: OutOfMemory when looping on dataset filter

2016-12-09 Thread Stephan Ewen
0(ObjectInputStream.java:1350) > > at java.io.ObjectInputStream.readObject(ObjectInputStream. > java:370) > > at org.apache.hive.hcatalog.common.HCatUtil.deserialize( > HCatUtil.java:117) > > at org.apache.hive.hcatalog.mapreduce.

RE: OutOfMemory when looping on dataset filter

2016-12-09 Thread LINZ, Arnaud
java:102) De : Fabian Hueske [mailto:fhue...@gmail.com] Envoyé : vendredi 9 décembre 2016 10:51 À : user@flink.apache.org Objet : Re: OutOfMemory when looping on dataset filter Hi Arnaud, Flink does not cache data at the moment. What happens is that for every day, the complete program is exec

Re: OutOfMemory when looping on dataset filter

2016-12-09 Thread Fabian Hueske
Hi Arnaud, Flink does not cache data at the moment. What happens is that for every day, the complete program is executed, i.e., also the program that computes wholeSet. Each execution should be independent from each other and all temporary data be cleaned up. Since Flink executes programs in a pip

OutOfMemory when looping on dataset filter

2016-12-09 Thread LINZ, Arnaud
Hello, I have a non-distributed treatment to apply to a DataSet of timed events, one day after another in a flink batch. My algorithm is: // wholeSet is too big to fit in RAM with a collect(), so we cut it in pieces DataSet wholeSet = [Select WholeSet]; for (day 1 to 31) { List<>