Great, thanks! On Tue, Dec 18, 2018 at 3:26 AM Kurt Young <ykt...@gmail.com> wrote:
> Hi, > > We have implemented ANALYZE TABLE in our internal version of Flink, and we > will try to contribute back to the community. > > Best, > Kurt > > > On Thu, Nov 29, 2018 at 9:23 PM Fabian Hueske <fhue...@gmail.com> wrote: > >> I'd try to tune it in a single query. >> If that does not work, go for as few queries as possible, splitting by >> column for better projection push-down. >> >> This is the first time I hear somebody requesting ANALYZE TABLE. >> I don't see a reason why it shouldn't be added in the future. >> >> >> >> Am Do., 29. Nov. 2018 um 12:08 Uhr schrieb Flavio Pompermaier < >> pomperma...@okkam.it>: >> >>> What do you advice to compute column stats? >>> Should I run multiple job (one per column) or try to compute all at once? >>> >>> Are you ever going to consider supporting ANALYZE TABLE (like in Hive or >>> Spark) in Flink Table API? >>> >>> Best, >>> Flavio >>> >>> On Thu, Nov 29, 2018 at 9:45 AM Fabian Hueske <fhue...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> You could try to enable object reuse. >>>> Alternatively you can give more heap memory or fine tune the GC >>>> parameters. >>>> >>>> I would not consider it a bug in Flink, but might be something that >>>> could be improved. >>>> >>>> Fabian >>>> >>>> >>>> Am Mi., 28. Nov. 2018 um 18:19 Uhr schrieb Flavio Pompermaier < >>>> pomperma...@okkam.it>: >>>> >>>>> Hi to all, >>>>> I have a batch dataset and I want to get some standard info about its >>>>> columns (like min, max, avg etc). >>>>> In order to achieve this I wrote a simple program that use SQL on >>>>> table API like the following: >>>>> >>>>> SELECT >>>>> MAX(col1), MIN(col1), AVG(col1), >>>>> MAX(col2), MIN(col2), AVG(col2), >>>>> MAX(col3), MIN(col3), AVG(col3) >>>>> FROM MYTABLE >>>>> >>>>> In my dataset I have about 50 fields and the query becomes quite big >>>>> (and the job plan too). >>>>> It seems that this kind of job cause the cluster to crash (too much >>>>> garbage collection). >>>>> Is there any smarter way to achieve this goal (apart from running a >>>>> job per column)? >>>>> Is this "normal" or is this a bug of Flink? >>>>> >>>>> Best, >>>>> Flavio >>>>> >>>> >>>