Re: Dataset column statistics

2019-02-05 Thread Flavio Pompermaier
Any news on this Kurt? Could you share some insight about how you implemented it? I'm debated whether to run multiple jobs or if analyze could be performed in a single big job Best, Flavio On Tue, Dec 18, 2018 at 3:26 AM Kurt Young wrote: > Hi, > > We have implemented ANALYZE TABLE in our inte

Re: Dataset column statistics

2018-12-18 Thread Flavio Pompermaier
Great, thanks! On Tue, Dec 18, 2018 at 3:26 AM Kurt Young wrote: > Hi, > > We have implemented ANALYZE TABLE in our internal version of Flink, and we > will try to contribute back to the community. > > Best, > Kurt > > > On Thu, Nov 29, 2018 at 9:23 PM Fabian Hueske wrote: > >> I'd try to tune

Re: Dataset column statistics

2018-12-17 Thread Kurt Young
Hi, We have implemented ANALYZE TABLE in our internal version of Flink, and we will try to contribute back to the community. Best, Kurt On Thu, Nov 29, 2018 at 9:23 PM Fabian Hueske wrote: > I'd try to tune it in a single query. > If that does not work, go for as few queries as possible, spli

Re: Dataset column statistics

2018-11-29 Thread Fabian Hueske
I'd try to tune it in a single query. If that does not work, go for as few queries as possible, splitting by column for better projection push-down. This is the first time I hear somebody requesting ANALYZE TABLE. I don't see a reason why it shouldn't be added in the future. Am Do., 29. Nov. 20

Re: Dataset column statistics

2018-11-29 Thread Flavio Pompermaier
What do you advice to compute column stats? Should I run multiple job (one per column) or try to compute all at once? Are you ever going to consider supporting ANALYZE TABLE (like in Hive or Spark) in Flink Table API? Best, Flavio On Thu, Nov 29, 2018 at 9:45 AM Fabian Hueske wrote: > Hi, > >

Re: Dataset column statistics

2018-11-29 Thread Fabian Hueske
Hi, You could try to enable object reuse. Alternatively you can give more heap memory or fine tune the GC parameters. I would not consider it a bug in Flink, but might be something that could be improved. Fabian Am Mi., 28. Nov. 2018 um 18:19 Uhr schrieb Flavio Pompermaier < pomperma...@okkam.

Dataset column statistics

2018-11-28 Thread Flavio Pompermaier
Hi to all, I have a batch dataset and I want to get some standard info about its columns (like min, max, avg etc). In order to achieve this I wrote a simple program that use SQL on table API like the following: SELECT MAX(col1), MIN(col1), AVG(col1), MAX(col2), MIN(col2), AVG(col2), MAX(col3), MI