Hi Xiaowei, Thank you for mentioned such key points. Yes, I think those points are very important for the clear definition of the semantics of Table AggregateFunction!I'd like share my thoughts about the those questions:
1. Do we allow multi-staged TableAggregate in this case? >From the points of my view, both Aggregates and TableAggregate should be support multi-staged ! Aggregates support multi-staged means the pre-aggregation of data, e.g.: table.select(count(*)), the optimizer to generate a new plan with the new statistics(May be hints), i.e.: Separate the count aggregate into partial aggregate which to do count, and global aggregate which to do sum. Pre-aggregation can solve hot issues. Partial-global aggregate is an important optimization for aggregate!Look at the interface of TableAggregateFunction, The only difference between aggregateFunction and TableAggregateFunction is the definition of output: getValue VS emitValue, and other calculation logic is the same. So I think we can get more benefit from supported multi-staged for TableAggregate。 2. What is the semantics of emit? Is it amendments to the previous output, or replacing it? I think currently the aggregate using getValue to update the old result, like the replacing behavior you said! Frankly speaking, I don't quite understand the consideration for you said about “the previous to the previous”. I will be very grateful if you can explain it in detail? 3. Does the group keys automatically appear in the output of GroupedTable.agg ? I think so, Because users usually calculate by keys, and 99% of the cases are expected to include keys in the output. What do you think? Best, Jincheng Xiaowei Jiang <xiaow...@gmail.com> 于2018年11月6日周二 下午7:16写道: > Hi Jincheng, > > Thanks for adding the public interfaces! I think that it's a very good > start. There are a few points that we need to have more discussions. > > - TableAggregateFunction - this is a very complex beast, definitely the > most complex user defined objects we introduced so far. I think there > are > quite some interesting questions here. For example, do we allow > multi-staged TableAggregate in this case? What is the semantics of > emit? Is > it amendments to the previous output, or replacing it? I think that this > subject itself is worth a discussion to make sure we get the details > right. > - GroupedTable.agg - does the group keys automatically appear in the > output? how about the case of windowing aggregation? > > Regards, > Xiaowei > > On Tue, Nov 6, 2018 at 6:25 PM jincheng sun <sunjincheng...@gmail.com> > wrote: > > > Hi, Xiaowei, > > > > Thanks for bring up the discuss of Table API Enhancement Outline ! > > > > I quickly looked at the overall content, these are good expressions of > our > > offline discussions. But from the points of my view, we should add the > > usage of public interfaces that we will introduce in this propose. So, I > > added the following usage description of interface and operators in > > google doc: > > > > 1. Map Operator > > Map operator is a new operator of Table, Map operator can apply a > > scalar function, and can return multi-column. The usage as follows: > > > > val res = tab > > .map(fun: ScalarFunction).as(‘a, ‘b, ‘c) > > .select(‘a, ‘c) > > > > 2. FlatMap Operator > > FaltMap operator is a new operator of Table, FlatMap operator can > apply > > a table function, and can return multi-row. The usage as follows: > > > > val res = tab > > .flatMap(fun: TableFunction).as(‘a, ‘b, ‘c) > > .select(‘a, ‘c) > > > > 3. Agg Operator > > Agg operator is a new operator of Table/GroupedTable, Agg operator > can > > apply a aggregate function, and can return multi-column. The usage as > > follows: > > > > val res = tab > > .groupBy(‘a) // leave groupBy-Clause out to define global > aggregates > > .agg(fun: AggregateFunction).as(‘a, ‘b, ‘c) > > .select(‘a, ‘c) > > > > 4. FlatAgg Operator > > FlatAgg operator is a new operator of Table/GroupedTable, FaltAgg > > operator can apply a table aggregate function, and can return multi-row. > > The usage as follows: > > > > val res = tab > > .groupBy(‘a) // leave groupBy-Clause out to define global table > > aggregates > > .flatAgg(fun: TableAggregateFunction).as(‘a, ‘b, ‘c) > > .select(‘a, ‘c) > > > > 5. TableAggregateFunction > > The behavior of table aggregates is most like GroupReduceFunction > did, > > which computed for a group of elements, and output a group of elements. > > The TableAggregateFunction can be applied on GroupedTable.flatAgg() . The > > interface of TableAggregateFunction has a lot of content, so I don't copy > > it here, Please look at the detail in google doc: > > > > > https://docs.google.com/document/d/19rVeyqveGtV33UZt72GV-DP2rLyNlfs0QNGG0xWjayY/edit > > > > I will be very appreciate to anyone for reviewing and commenting. > > > > Best, > > Jincheng > > >