Hi Xiaowei,
Thank you for mentioned such key points. Yes, I think those points are very
important for the clear definition of the semantics of Table
AggregateFunction!I'd like share my thoughts about the those questions:

1.  Do we allow multi-staged TableAggregate in this case?
>From the points of my view, both Aggregates and TableAggregate should be
support multi-staged !
Aggregates support multi-staged means the pre-aggregation of data, e.g.:
table.select(count(*)), the optimizer to generate a new plan with the new
statistics(May be hints), i.e.: Separate the count aggregate into partial
aggregate which to do count, and global aggregate which to do sum.
Pre-aggregation can solve hot issues. Partial-global aggregate is an
important optimization for aggregate!Look at the interface of
TableAggregateFunction, The only difference between aggregateFunction and
TableAggregateFunction is the definition of output: getValue VS emitValue,
and other calculation logic is the same. So I think we can get more benefit
from supported multi-staged for TableAggregate。

2.  What is the semantics of emit? Is it amendments to the previous output,
or replacing it?
I think currently the aggregate using getValue to update the old result,
like the replacing behavior you said!
Frankly speaking, I don't quite understand the consideration for you said
about “the previous to the previous”. I will be very grateful if you can
explain it in detail?

3. Does the group keys automatically appear in the output of
GroupedTable.agg ?
I think so, Because users usually calculate by keys, and 99% of the cases
are expected to include keys in the output.

What do you think?

Best,
Jincheng

Xiaowei Jiang <xiaow...@gmail.com> 于2018年11月6日周二 下午7:16写道:

> Hi Jincheng,
>
> Thanks for adding the public interfaces! I think that it's a very good
> start. There are a few points that we need to have more discussions.
>
>    - TableAggregateFunction - this is a very complex beast, definitely the
>    most complex user defined objects we introduced so far. I think there
> are
>    quite some interesting questions here. For example, do we allow
>    multi-staged TableAggregate in this case? What is the semantics of
> emit? Is
>    it amendments to the previous output, or replacing it? I think that this
>    subject itself is worth a discussion to make sure we get the details
> right.
>    - GroupedTable.agg - does the group keys automatically appear in the
>    output? how about the case of windowing aggregation?
>
> Regards,
> Xiaowei
>
> On Tue, Nov 6, 2018 at 6:25 PM jincheng sun <sunjincheng...@gmail.com>
> wrote:
>
> > Hi, Xiaowei,
> >
> > Thanks for bring up the discuss of Table API Enhancement Outline !
> >
> > I quickly looked at the overall content, these are good expressions of
> our
> > offline discussions. But from the points of my view, we should add the
> > usage of public interfaces that we will introduce in this propose.  So, I
> > added the following usage description of  interface and operators  in
> > google doc:
> >
> > 1. Map Operator
> >     Map operator is a new operator of Table, Map operator can apply a
> > scalar function, and can return multi-column. The usage as follows:
> >
> >   val res = tab
> >      .map(fun: ScalarFunction).as(‘a, ‘b, ‘c)
> >      .select(‘a, ‘c)
> >
> > 2. FlatMap Operator
> >     FaltMap operator is a new operator of Table, FlatMap operator can
> apply
> > a table function, and can return multi-row. The usage as follows:
> >
> >   val res = tab
> >       .flatMap(fun: TableFunction).as(‘a, ‘b, ‘c)
> >       .select(‘a, ‘c)
> >
> > 3. Agg Operator
> >     Agg operator is a new operator of Table/GroupedTable, Agg operator
> can
> > apply a aggregate function, and can return multi-column. The usage as
> > follows:
> >
> >    val res = tab
> >       .groupBy(‘a) // leave groupBy-Clause out to define global
> aggregates
> >       .agg(fun: AggregateFunction).as(‘a, ‘b, ‘c)
> >       .select(‘a, ‘c)
> >
> > 4.  FlatAgg Operator
> >     FlatAgg operator is a new operator of Table/GroupedTable, FaltAgg
> > operator can apply a table aggregate function, and can return multi-row.
> > The usage as follows:
> >
> >     val res = tab
> >        .groupBy(‘a) // leave groupBy-Clause out to define global table
> > aggregates
> >        .flatAgg(fun: TableAggregateFunction).as(‘a, ‘b, ‘c)
> >        .select(‘a, ‘c)
> >
> >   5. TableAggregateFunction
> >      The behavior of table aggregates is most like GroupReduceFunction
> did,
> > which computed for a group of elements, and output  a group of elements.
> > The TableAggregateFunction can be applied on GroupedTable.flatAgg() . The
> > interface of TableAggregateFunction has a lot of content, so I don't copy
> > it here, Please look at the detail in google doc:
> >
> >
> https://docs.google.com/document/d/19rVeyqveGtV33UZt72GV-DP2rLyNlfs0QNGG0xWjayY/edit
> >
> > I will be very appreciate to anyone for reviewing and commenting.
> >
> > Best,
> > Jincheng
> >
>

Reply via email to