Hi to all, I have a batch dataset and I want to get some standard info about its columns (like min, max, avg etc). In order to achieve this I wrote a simple program that use SQL on table API like the following:
SELECT MAX(col1), MIN(col1), AVG(col1), MAX(col2), MIN(col2), AVG(col2), MAX(col3), MIN(col3), AVG(col3) FROM MYTABLE In my dataset I have about 50 fields and the query becomes quite big (and the job plan too). It seems that this kind of job cause the cluster to crash (too much garbage collection). Is there any smarter way to achieve this goal (apart from running a job per column)? Is this "normal" or is this a bug of Flink? Best, Flavio