Hi, You could try to enable object reuse. Alternatively you can give more heap memory or fine tune the GC parameters.
I would not consider it a bug in Flink, but might be something that could be improved. Fabian Am Mi., 28. Nov. 2018 um 18:19 Uhr schrieb Flavio Pompermaier < pomperma...@okkam.it>: > Hi to all, > I have a batch dataset and I want to get some standard info about its > columns (like min, max, avg etc). > In order to achieve this I wrote a simple program that use SQL on table > API like the following: > > SELECT > MAX(col1), MIN(col1), AVG(col1), > MAX(col2), MIN(col2), AVG(col2), > MAX(col3), MIN(col3), AVG(col3) > FROM MYTABLE > > In my dataset I have about 50 fields and the query becomes quite big (and > the job plan too). > It seems that this kind of job cause the cluster to crash (too much > garbage collection). > Is there any smarter way to achieve this goal (apart from running a job > per column)? > Is this "normal" or is this a bug of Flink? > > Best, > Flavio >