Thanks Dian, +1 to `deduplicate`.
Regarding `myTable.coalesce($("a"), 1).as("a")`, I'm afraid it may conflict/confuse the built-in expression `coalesce(f0, 0)` (we may introduce it in the future). Besides that, could we also align other features of Flink SQL, e.g. event-time/processing-time temporal join, SQL Hints, window TVF (FLIP-145 [1])? Best, Jark [1]: https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function On Mon, 4 Jan 2021 at 22:59, Timo Walther <twal...@apache.org> wrote: > Hi Dian, > > thanks for the proposed FLIP. I haven't taken a deep look at the > proposal yet but will do so shortly. In general, we should aim to make > the Table API as concise and self-explaining as possible. E.g. `dropna` > does not sound obvious to me. > > Regarding `myTable.coalesce($("a"), 1).as("a")`: Instead of introducing > more top-level functions, maybe we should also consider introducing more > building blocks e.g. for applying an expression to every column. A more > functional approach (e.g. with lamba function) could solve more use cases. > > Regards, > Timo > > On 04.01.21 15:35, Seth Wiesman wrote: > > This makes sense, I have some questions about method names. > > > > What do you think about renaming `dropDuplicates` to `deduplicate`? I > don't > > think that drop is the right word to use for this operation, it implies > > records are filtered where this operator actually issues updates and > > retractions. Also, deduplicate is already how we talk about this feature > in > > the docs so I think it would be easier for users to find. > > > > For null handling, I don't know how close we want to stick with SQL > > conventions but what about making `coalesce` a top-level method? > Something > > like: > > > > myTable.coalesce($("a"), 1).as("a") > > > > We can require the next method to be an `as`. There is already precedent > > for this sort of thing, `GroupedTable#aggregate` can only be followed by > > `select`. > > > > Seth > > > > On Mon, Jan 4, 2021 at 6:27 AM Wei Zhong <weizhong0...@gmail.com> wrote: > > > >> Hi Dian, > >> > >> Big +1 for making the Table API easier to use. Java users and Python > users > >> can both benefit from it. I think it would be better if we add some > Python > >> API examples. > >> > >> Best, > >> Wei > >> > >> > >>> 在 2021年1月4日,20:03,Dian Fu <dian0511...@gmail.com> 写道: > >>> > >>> Hi all, > >>> > >>> I'd like to start a discussion about introducing a few convenient > >> operations in Table API from the perspective of ease of use. > >>> > >>> Currently some tasks are not easy to express in Table API e.g. > >> deduplication, topn, etc, or not easy to express when there are > hundreds of > >> columns in a table, e.g. null data handling, etc. > >>> > >>> I'd like to propose to introduce a few operations in Table API with the > >> following purposes: > >>> - Make Table API users to easily leverage the powerful features already > >> in SQL, e.g. deduplication, topn, etc > >>> - Provide some convenient operations, e.g. introducing a series of > >> operations for null data handling (it may become a problem when there > are > >> hundreds of columns), data sampling and splitting (which is a very > common > >> use case in ML which usually needs to split a table into multiple tables > >> for training and validation separately). > >>> > >>> Please refer to FLIP-155 [1] for more details. > >>> > >>> Looking forward to your feedback! > >>> > >>> Regards, > >>> Dian > >>> > >>> [1] > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-155%3A+Introduce+a+few+convenient+operations+in+Table+API > >> > >> > > > >