Hi jingcheng,

I agree with Kurt's point. As you said "the user must know the keys of the
output of UDTAGG clearly".
If I understand correctly, the key information is strongly relative to the
UDTAGG implementation.
Users may call `flatAggregate` on a UDTAGG instance with different keys
which may result in a wrong result.
So I think it would be better to couple key information with UDTAGG
interface (i.e. "Approach 3" in your design doc).

Regards,
Jark

On Thu, 4 Jul 2019 at 18:06, Kurt Young <ykt...@gmail.com> wrote:

> Hi Jincheng,
>
> Thanks for the clarification. I think you just pointed out my concern by
> yourself:
>
> > When a user uses a User-defined table aggregate function (UDTAGG), he
> must understand the behavior of the UDTAGG, including the return type and
> the characteristics of the returned data. such as the key fields.
>
> This indicates that the UDTAGG is somehow be classified to different types,
> one will no key, one with key information. So the developer of the UDTAGG
> should choose which type of this function should be. In this case,
> my question would be, why don't we have explicit information about keys
> such as we split UDTAGG to keyed UDTAGG and non-keyed UDTAGG. So the user
> and the framework will have a better understanding of
> this UDTAGG. `withKeys` solution is letting user to choose the key and it
> seems it will only work correctly only if the user choose the *right* key
> this UDTAGG has.
>
> Let me know if this makes sense to you.
>
> Best,
> Kurt
>
>
> On Thu, Jul 4, 2019 at 4:32 PM jincheng sun <sunjincheng...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > @Kurt Young <ykt...@gmail.com> one user-defined table aggregate function
> > can be used in both with(out) keys case, and we do not introduce any
> other
> > aggregations. just like the explanation from @Hequn.
> >
> > @Hequn Cheng <chenghe...@gmail.com> thanks for your explanation!
> >
> > One thing should be mentioned here:
> >
> > When a user uses a User-defined table aggregate function (UDTAGG), he
> must
> > understand the behavior of the UDTAGG, including the return type and the
> > characteristics of the returned data. such as the key fields.  So
> although
> > using `withKeys` approach it is not rigorous enough(we do not need) but
> > intuitive enough, considering that if `flatAggregate` is followed by an
> > `upsertSink`, then the user must know the keys of the output of UDTAGG
> > clearly, otherwise the keys of `upsertSink` cannot be defined. So I still
> > prefer the `withKeys` solution by now.
> >
> > Looking forward to any feedback from all of you!
> >
> > Best,
> > Jincheng
> >
> >
> >
> > Hequn Cheng <chenghe...@gmail.com> 于2019年7月1日周一 下午5:35写道:
> >
> >> Hi Kurt,
> >>
> >> Thanks for your questions. Here are my thoughts.
> >>
> >> > if I want to write such kind function, should I make sure that this
> >> function is used with some keys?
> >> The key information may not be used. We can also use RetractSink to emit
> >> the table directly.
> >>
> >> >  If I need a use case to calculate topn without key, should I write
> >> another function or I can reuse previous one.
> >> For the TopN example, you can reuse the previous function if you don't
> >> care
> >> about the key information.
> >>
> >> So, the key information is only an indicator(or a description), not an
> >> operator, as Jincheng mentioned above.
> >> We do not need to change the function logic and it will not add any
> other
> >> aggregations.
> >>
> >> BTW, we have three approaches in the document. Approach 1 defines keys
> on
> >> API level as we think it's more common to define keys on Table.
> >> While approach 3 defines keys in the TableAggregateFunction which is
> more
> >> precise but it is not very clear for Table users. So, we should take all
> >> these into consideration, and make the decision in this discussion
> thread.
> >>
> >> You can take a look at the document and welcome any suggestions or other
> >> better solutions.
> >>
> >> Best, Hequn
> >>
> >>
> >> On Mon, Jul 1, 2019 at 12:13 PM Kurt Young <ykt...@gmail.com> wrote:
> >>
> >> > Hi Jincheng,
> >> >
> >> > Thanks for the clarification. Take 'TopN' for example, if I want to
> >> write
> >> > such kind function,
> >> > should I make sure that this function is used with some keys? If I
> need
> >> a
> >> > use case to calculate
> >> > topn without key, should I write another function or I can reuse
> >> previous
> >> > one.
> >> >
> >> > I'm not sure about the idea of this does not involve semantic changes.
> >> To
> >> > me, it sounds like
> >> > we are doing another nested aggregation inside the table
> >> > which TableAggregateFunction emits.
> >> >
> >> > Maybe I'm not familiar with this function enough, hope you can help me
> >> to
> >> > understand.
> >> >
> >> > Best,
> >> > Kurt
> >> >
> >> >
> >> > On Mon, Jul 1, 2019 at 11:59 AM jincheng sun <
> sunjincheng...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi Kurt,
> >> > >
> >> > > Thanks for your questions, I am glad to share my thoughts here:
> >> > >
> >> > > My question is, will that effect the logic ofTableAggregateFunction
> >> user
> >> > > > wrote? Should the user know that there will a key and make some
> >> changes
> >> > > to
> >> > > > this function?
> >> > >
> >> > >
> >> > > No, the keys information depends on the implementation of the
> >> > > TableAggregateFunction.
> >> > > For example, for a `topN` user defined TableAggregateFunction, we
> can
> >> > only
> >> > > use the `keys` if the `topN` contains `rankid` in the output. You
> can
> >> > > treat the
> >> > > `keys` like an indicator.
> >> > >
> >> > > If not, how will framework deal with the output of user's
> >> > > > TableAggregateFunction.  if user output multiple rows with the
> same
> >> > key,
> >> > > > should be latter one replace previous ones?
> >> > >
> >> > >
> >> > > If a TableAggregateFunction outputs multiple rows with the same key,
> >> the
> >> > > latter one should replace the previous one, either with upsert mode
> or
> >> > > retract mode. i.e., Whether the user defines the Key or not, the
> Flink
> >> > > framework should ensure the correctness of the semantics.
> >> > >
> >> > > At present, the problem we are discussing does not involve semantic
> >> > > changes. The definition of keys is to support non-window
> >> flatAggregate on
> >> > > upsert mode. (The upsert mode is already supported in the flink
> >> > framework.
> >> > > The current discussion only needs to inform the framework that the
> >> keys
> >> > > information, which is the `withKeys` API we discussing.)
> >> > >
> >> > > Welcome any other feedbacks :)
> >> > >
> >> > > Best,
> >> > > Jincheng
> >> > >
> >> > > Kurt Young <ykt...@gmail.com> 于2019年7月1日周一 上午9:23写道:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > I have a question about the key information of
> >> TableAggregateFunction.
> >> > > > IIUC, you need to define
> >> > > > something like primary key or unique key in the result table of
> >> > > > TableAggregateFunction, and also
> >> > > > need a way to let user configure this through the API. My question
> >> is,
> >> > > will
> >> > > > that effect the logic of
> >> > > > TableAggregateFunction user wrote? Should the user know that there
> >> > will a
> >> > > > key and make some changes
> >> > > > to this function?
> >> > > >
> >> > > > If so, what's the semantic the user should learned. If not, how
> will
> >> > > > framework deal with the output of user's
> >> > > > TableAggregateFunction. For example, if user output multiple rows
> >> with
> >> > > the
> >> > > > same key, should be latter one
> >> > > > replace previous ones?
> >> > > >
> >> > > > Best,
> >> > > > Kurt
> >> > > >
> >> > > >
> >> > > > On Mon, Jul 1, 2019 at 7:19 AM jincheng sun <
> >> sunjincheng...@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Hi hequn, Thanks for the reply! I think `withKeys` solution is
> our
> >> > > better
> >> > > > > choice!
> >> > > > >
> >> > > > >
> >> > > > > Hequn Cheng <chenghe...@gmail.com> 于2019年6月26日周三 下午5:11写道:
> >> > > > >
> >> > > > > > Hi Jincheng,
> >> > > > > >
> >> > > > > > Thanks for raising the discussion!
> >> > > > > > The key information is very important for query optimizations.
> >> It
> >> > > would
> >> > > > > be
> >> > > > > > nice if we can use upsert mode to achieve better performance.
> >> > > > > >
> >> > > > > > +1 for the `withKeys` proposal. :)
> >> > > > > >
> >> > > > > > Best, Hequn
> >> > > > > >
> >> > > > > >
> >> > > > > > On Wed, Jun 26, 2019 at 4:37 PM jincheng sun <
> >> > > sunjincheng...@gmail.com
> >> > > > >
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > Hi all,
> >> > > > > > >
> >> > > > > > > With the continuous efforts from the community, we already
> >> > > supported
> >> > > > > > > `flatAggregate`[1] on TableAPI in retract mode. I think It's
> >> > better
> >> > > > to
> >> > > > > > add
> >> > > > > > > upsert mode for  `flatAggregate`.
> >> > > > > > >
> >> > > > > > > The result table of streaming non-window `flatAggregate` is
> a
> >> > table
> >> > > > > > > contains updates. We can, of course, use a
> >> > > RetractStreamTableSink[2]
> >> > > > to
> >> > > > > > > emit the table, but we can get better performance in upsert
> >> mode.
> >> > > > > > However,
> >> > > > > > > due to the lack of keys, we can’t use an
> >> UpsertStreamTableSink to
> >> > > > emit
> >> > > > > > the
> >> > > > > > > table. We don’t have this problem for a normal aggregate as
> it
> >> > > emits
> >> > > > a
> >> > > > > > > single row for each group, so the unique keys are exactly
> the
> >> > same
> >> > > > with
> >> > > > > > the
> >> > > > > > > group keys. While for a `flatAggregate`, its pretty
> difference
> >> > that
> >> > > > due
> >> > > > > > to
> >> > > > > > > emits multi rows(a “sub-table”) for a single group. To solve
> >> this
> >> > > > > > problem,
> >> > > > > > > we need to find a way to define keys on flatAggregate, so
> >> that we
> >> > > can
> >> > > > > > also
> >> > > > > > > use upsert sink to emit the result table after
> flatAggregate.
> >> > > > > > >
> >> > > > > > > So, Aljoscha, Hequn and I prepared a design document for how
> >> to
> >> > > > define
> >> > > > > > the
> >> > > > > > > update keys for  `flatAggregate` in upsert mode.  The detail
> >> can
> >> > be
> >> > > > > found
> >> > > > > > > here:
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://docs.google.com/document/d/183qHo8PDG-xserDi_AbGP6YX9aPY0rVr80p3O3Gyz6U/edit?usp=sharing
> >> > > > > > >
> >> > > > > > > I appreciate it if you can have look at the document and any
> >> > > comments
> >> > > > > are
> >> > > > > > > welcome!
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > Best,
> >> > > > > > >
> >> > > > > > > Jincheng
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > [1]
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97552739
> >> > > > > > >
> >> > > > > > > [2]
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/sourceSinks.html#defining-a-streamtablesource
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Reply via email to