: Re: OutOfMemory when looping on dataset filter
Hi Arnaud!
I assume you are using either a standalone setup, or a YARN session?
This looks to me as if classes cannot be properly garbage collected. Since each
job (each day is executed as a separate job), loads the classes again, the
PermGen space runs
0(ObjectInputStream.java:1350)
>
> at java.io.ObjectInputStream.readObject(ObjectInputStream.
> java:370)
>
> at org.apache.hive.hcatalog.common.HCatUtil.deserialize(
> HCatUtil.java:117)
>
> at org.apache.hive.hcatalog.mapreduce.
java:102)
De : Fabian Hueske [mailto:fhue...@gmail.com]
Envoyé : vendredi 9 décembre 2016 10:51
À : user@flink.apache.org
Objet : Re: OutOfMemory when looping on dataset filter
Hi Arnaud,
Flink does not cache data at the moment.
What happens is that for every day, the complete program is exec
Hi Arnaud,
Flink does not cache data at the moment.
What happens is that for every day, the complete program is executed, i.e.,
also the program that computes wholeSet.
Each execution should be independent from each other and all temporary data
be cleaned up.
Since Flink executes programs in a pip
Hello,
I have a non-distributed treatment to apply to a DataSet of timed events, one
day after another in a flink batch.
My algorithm is:
// wholeSet is too big to fit in RAM with a collect(), so we cut it in pieces
DataSet wholeSet = [Select WholeSet];
for (day 1 to 31) {
List<>