Thanks a lot for your comments! Regarding to Python Table API examples: I thought it should be straightforward about how to use these operations in Python Table API and so have not added them. However, the suggestions make sense to me and I have added some examples about how to use them in Python Table API to make it more clear.
Regarding to dropDuplicates vs deduplicate: +1 to use deduplicate. It's more consistent with the feature/concept which is already documented clearly in Flink. Regarding to `myTable.coalesce($("a"), 1).as("a")`: I'm still in favor of fillna for now. Compared to coalesce, fillna could handle multiple columns in one method call. For the naming convention, the name "fillna/dropna/replace" comes from Pandas [1][2][3]. Regarding to `event-time/processing-time temporal join, SQL Hints, window TVF`: Good catch! Definitely we should support them in Table API. I will update the FLIP about these functionalities. [1] https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html> [2] https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html> [3] https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html> > 在 2021年1月4日,下午10:59,Timo Walther <twal...@apache.org> 写道: > > Hi Dian, > > thanks for the proposed FLIP. I haven't taken a deep look at the proposal yet > but will do so shortly. In general, we should aim to make the Table API as > concise and self-explaining as possible. E.g. `dropna` does not sound obvious > to me. > > Regarding `myTable.coalesce($("a"), 1).as("a")`: Instead of introducing more > top-level functions, maybe we should also consider introducing more building > blocks e.g. for applying an expression to every column. A more functional > approach (e.g. with lamba function) could solve more use cases. > > Regards, > Timo > > On 04.01.21 15:35, Seth Wiesman wrote: >> This makes sense, I have some questions about method names. >> What do you think about renaming `dropDuplicates` to `deduplicate`? I don't >> think that drop is the right word to use for this operation, it implies >> records are filtered where this operator actually issues updates and >> retractions. Also, deduplicate is already how we talk about this feature in >> the docs so I think it would be easier for users to find. >> For null handling, I don't know how close we want to stick with SQL >> conventions but what about making `coalesce` a top-level method? Something >> like: >> myTable.coalesce($("a"), 1).as("a") >> We can require the next method to be an `as`. There is already precedent >> for this sort of thing, `GroupedTable#aggregate` can only be followed by >> `select`. >> Seth >> On Mon, Jan 4, 2021 at 6:27 AM Wei Zhong <weizhong0...@gmail.com> wrote: >>> Hi Dian, >>> >>> Big +1 for making the Table API easier to use. Java users and Python users >>> can both benefit from it. I think it would be better if we add some Python >>> API examples. >>> >>> Best, >>> Wei >>> >>> >>>> 在 2021年1月4日,20:03,Dian Fu <dian0511...@gmail.com> 写道: >>>> >>>> Hi all, >>>> >>>> I'd like to start a discussion about introducing a few convenient >>> operations in Table API from the perspective of ease of use. >>>> >>>> Currently some tasks are not easy to express in Table API e.g. >>> deduplication, topn, etc, or not easy to express when there are hundreds of >>> columns in a table, e.g. null data handling, etc. >>>> >>>> I'd like to propose to introduce a few operations in Table API with the >>> following purposes: >>>> - Make Table API users to easily leverage the powerful features already >>> in SQL, e.g. deduplication, topn, etc >>>> - Provide some convenient operations, e.g. introducing a series of >>> operations for null data handling (it may become a problem when there are >>> hundreds of columns), data sampling and splitting (which is a very common >>> use case in ML which usually needs to split a table into multiple tables >>> for training and validation separately). >>>> >>>> Please refer to FLIP-155 [1] for more details. >>>> >>>> Looking forward to your feedback! >>>> >>>> Regards, >>>> Dian >>>> >>>> [1] >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-155%3A+Introduce+a+few+convenient+operations+in+Table+API >>> >>> >