Hi Dian,thanks for the proposed FLIP. I haven't taken a deep look at the proposal yet but will do so shortly. In general, we should aim to make the Table API as concise and self-explaining as possible. E.g. `dropna` does not sound obvious to me.
Regarding `myTable.coalesce($("a"), 1).as("a")`: Instead of introducing more top-level functions, maybe we should also consider introducing more building blocks e.g. for applying an expression to every column. A more functional approach (e.g. with lamba function) could solve more use cases.
Regards, Timo On 04.01.21 15:35, Seth Wiesman wrote:
This makes sense, I have some questions about method names. What do you think about renaming `dropDuplicates` to `deduplicate`? I don't think that drop is the right word to use for this operation, it implies records are filtered where this operator actually issues updates and retractions. Also, deduplicate is already how we talk about this feature in the docs so I think it would be easier for users to find. For null handling, I don't know how close we want to stick with SQL conventions but what about making `coalesce` a top-level method? Something like: myTable.coalesce($("a"), 1).as("a") We can require the next method to be an `as`. There is already precedent for this sort of thing, `GroupedTable#aggregate` can only be followed by `select`. Seth On Mon, Jan 4, 2021 at 6:27 AM Wei Zhong <weizhong0...@gmail.com> wrote:Hi Dian, Big +1 for making the Table API easier to use. Java users and Python users can both benefit from it. I think it would be better if we add some Python API examples. Best, Wei在 2021年1月4日,20:03,Dian Fu <dian0511...@gmail.com> 写道: Hi all, I'd like to start a discussion about introducing a few convenientoperations in Table API from the perspective of ease of use.Currently some tasks are not easy to express in Table API e.g.deduplication, topn, etc, or not easy to express when there are hundreds of columns in a table, e.g. null data handling, etc.I'd like to propose to introduce a few operations in Table API with thefollowing purposes:- Make Table API users to easily leverage the powerful features alreadyin SQL, e.g. deduplication, topn, etc- Provide some convenient operations, e.g. introducing a series ofoperations for null data handling (it may become a problem when there are hundreds of columns), data sampling and splitting (which is a very common use case in ML which usually needs to split a table into multiple tables for training and validation separately).Please refer to FLIP-155 [1] for more details. Looking forward to your feedback! Regards, Dian [1]https://cwiki.apache.org/confluence/display/FLINK/FLIP-155%3A+Introduce+a+few+convenient+operations+in+Table+API