Thanks Dian,

+1 to `deduplicate`.

Regarding `myTable.coalesce($("a"), 1).as("a")`, I'm afraid it may
conflict/confuse the built-in expression `coalesce(f0, 0)` (we may
introduce it in the future).

Besides that, could we also align other features of Flink SQL, e.g.
event-time/processing-time temporal join, SQL Hints, window TVF (FLIP-145
[1])?

Best,
Jark

[1]:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function





On Mon, 4 Jan 2021 at 22:59, Timo Walther <twal...@apache.org> wrote:

> Hi Dian,
>
> thanks for the proposed FLIP. I haven't taken a deep look at the
> proposal yet but will do so shortly. In general, we should aim to make
> the Table API as concise and self-explaining as possible. E.g. `dropna`
> does not sound obvious to me.
>
> Regarding `myTable.coalesce($("a"), 1).as("a")`: Instead of introducing
> more top-level functions, maybe we should also consider introducing more
> building blocks e.g. for applying an expression to every column. A more
> functional approach (e.g. with lamba function) could solve more use cases.
>
> Regards,
> Timo
>
> On 04.01.21 15:35, Seth Wiesman wrote:
> > This makes sense, I have some questions about method names.
> >
> > What do you think about renaming `dropDuplicates` to `deduplicate`? I
> don't
> > think that drop is the right word to use for this operation, it implies
> > records are filtered where this operator actually issues updates and
> > retractions. Also, deduplicate is already how we talk about this feature
> in
> > the docs so I think it would be easier for users to find.
> >
> > For null handling, I don't know how close we want to stick with SQL
> > conventions but what about making `coalesce` a top-level method?
> Something
> > like:
> >
> > myTable.coalesce($("a"), 1).as("a")
> >
> > We can require the next method to be an `as`. There is already precedent
> > for this sort of thing, `GroupedTable#aggregate` can only be followed by
> > `select`.
> >
> > Seth
> >
> > On Mon, Jan 4, 2021 at 6:27 AM Wei Zhong <weizhong0...@gmail.com> wrote:
> >
> >> Hi Dian,
> >>
> >> Big +1 for making the Table API easier to use. Java users and Python
> users
> >> can both benefit from it. I think it would be better if we add some
> Python
> >> API examples.
> >>
> >> Best,
> >> Wei
> >>
> >>
> >>> 在 2021年1月4日,20:03,Dian Fu <dian0511...@gmail.com> 写道:
> >>>
> >>> Hi all,
> >>>
> >>> I'd like to start a discussion about introducing a few convenient
> >> operations in Table API from the perspective of ease of use.
> >>>
> >>> Currently some tasks are not easy to express in Table API e.g.
> >> deduplication, topn, etc, or not easy to express when there are
> hundreds of
> >> columns in a table, e.g. null data handling, etc.
> >>>
> >>> I'd like to propose to introduce a few operations in Table API with the
> >> following purposes:
> >>> - Make Table API users to easily leverage the powerful features already
> >> in SQL, e.g. deduplication, topn, etc
> >>> - Provide some convenient operations, e.g. introducing a series of
> >> operations for null data handling (it may become a problem when there
> are
> >> hundreds of columns), data sampling and splitting (which is a very
> common
> >> use case in ML which usually needs to split a table into multiple tables
> >> for training and validation separately).
> >>>
> >>> Please refer to FLIP-155 [1] for more details.
> >>>
> >>> Looking forward to your feedback!
> >>>
> >>> Regards,
> >>> Dian
> >>>
> >>> [1]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-155%3A+Introduce+a+few+convenient+operations+in+Table+API
> >>
> >>
> >
>
>

Reply via email to