I'd try to tune it in a single query. If that does not work, go for as few queries as possible, splitting by column for better projection push-down.
This is the first time I hear somebody requesting ANALYZE TABLE. I don't see a reason why it shouldn't be added in the future. Am Do., 29. Nov. 2018 um 12:08 Uhr schrieb Flavio Pompermaier < pomperma...@okkam.it>: > What do you advice to compute column stats? > Should I run multiple job (one per column) or try to compute all at once? > > Are you ever going to consider supporting ANALYZE TABLE (like in Hive or > Spark) in Flink Table API? > > Best, > Flavio > > On Thu, Nov 29, 2018 at 9:45 AM Fabian Hueske <fhue...@gmail.com> wrote: > >> Hi, >> >> You could try to enable object reuse. >> Alternatively you can give more heap memory or fine tune the GC >> parameters. >> >> I would not consider it a bug in Flink, but might be something that could >> be improved. >> >> Fabian >> >> >> Am Mi., 28. Nov. 2018 um 18:19 Uhr schrieb Flavio Pompermaier < >> pomperma...@okkam.it>: >> >>> Hi to all, >>> I have a batch dataset and I want to get some standard info about its >>> columns (like min, max, avg etc). >>> In order to achieve this I wrote a simple program that use SQL on table >>> API like the following: >>> >>> SELECT >>> MAX(col1), MIN(col1), AVG(col1), >>> MAX(col2), MIN(col2), AVG(col2), >>> MAX(col3), MIN(col3), AVG(col3) >>> FROM MYTABLE >>> >>> In my dataset I have about 50 fields and the query becomes quite big >>> (and the job plan too). >>> It seems that this kind of job cause the cluster to crash (too much >>> garbage collection). >>> Is there any smarter way to achieve this goal (apart from running a job >>> per column)? >>> Is this "normal" or is this a bug of Flink? >>> >>> Best, >>> Flavio >>> >> >