Hi all, I have updated the FLIP about temporal join, sql hints and window TVF.
Regards, Dian > 在 2021年1月5日,上午11:58,Dian Fu <dian0511...@gmail.com> 写道: > > Thanks a lot for your comments! > > Regarding to Python Table API examples: I thought it should be > straightforward about how to use these operations in Python Table API and so > have not added them. However, the suggestions make sense to me and I have > added some examples about how to use them in Python Table API to make it more > clear. > > Regarding to dropDuplicates vs deduplicate: +1 to use deduplicate. It's more > consistent with the feature/concept which is already documented clearly in > Flink. > > Regarding to `myTable.coalesce($("a"), 1).as("a")`: I'm still in favor of > fillna for now. Compared to coalesce, fillna could handle multiple columns in > one method call. For the naming convention, the name "fillna/dropna/replace" > comes from Pandas [1][2][3]. > > Regarding to `event-time/processing-time temporal join, SQL Hints, window > TVF`: Good catch! Definitely we should support them in Table API. I will > update the FLIP about these functionalities. > > [1] https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html > <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html> > [2] https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html > <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html> > [3] > https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html > <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html> >> 在 2021年1月4日,下午10:59,Timo Walther <twal...@apache.org >> <mailto:twal...@apache.org>> 写道: >> >> Hi Dian, >> >> thanks for the proposed FLIP. I haven't taken a deep look at the proposal >> yet but will do so shortly. In general, we should aim to make the Table API >> as concise and self-explaining as possible. E.g. `dropna` does not sound >> obvious to me. >> >> Regarding `myTable.coalesce($("a"), 1).as("a")`: Instead of introducing more >> top-level functions, maybe we should also consider introducing more building >> blocks e.g. for applying an expression to every column. A more functional >> approach (e.g. with lamba function) could solve more use cases. >> >> Regards, >> Timo >> >> On 04.01.21 15:35, Seth Wiesman wrote: >>> This makes sense, I have some questions about method names. >>> What do you think about renaming `dropDuplicates` to `deduplicate`? I don't >>> think that drop is the right word to use for this operation, it implies >>> records are filtered where this operator actually issues updates and >>> retractions. Also, deduplicate is already how we talk about this feature in >>> the docs so I think it would be easier for users to find. >>> For null handling, I don't know how close we want to stick with SQL >>> conventions but what about making `coalesce` a top-level method? Something >>> like: >>> myTable.coalesce($("a"), 1).as("a") >>> We can require the next method to be an `as`. There is already precedent >>> for this sort of thing, `GroupedTable#aggregate` can only be followed by >>> `select`. >>> Seth >>> On Mon, Jan 4, 2021 at 6:27 AM Wei Zhong <weizhong0...@gmail.com >>> <mailto:weizhong0...@gmail.com>> wrote: >>>> Hi Dian, >>>> >>>> Big +1 for making the Table API easier to use. Java users and Python users >>>> can both benefit from it. I think it would be better if we add some Python >>>> API examples. >>>> >>>> Best, >>>> Wei >>>> >>>> >>>>> 在 2021年1月4日,20:03,Dian Fu <dian0511...@gmail.com >>>>> <mailto:dian0511...@gmail.com>> 写道: >>>>> >>>>> Hi all, >>>>> >>>>> I'd like to start a discussion about introducing a few convenient >>>> operations in Table API from the perspective of ease of use. >>>>> >>>>> Currently some tasks are not easy to express in Table API e.g. >>>> deduplication, topn, etc, or not easy to express when there are hundreds of >>>> columns in a table, e.g. null data handling, etc. >>>>> >>>>> I'd like to propose to introduce a few operations in Table API with the >>>> following purposes: >>>>> - Make Table API users to easily leverage the powerful features already >>>> in SQL, e.g. deduplication, topn, etc >>>>> - Provide some convenient operations, e.g. introducing a series of >>>> operations for null data handling (it may become a problem when there are >>>> hundreds of columns), data sampling and splitting (which is a very common >>>> use case in ML which usually needs to split a table into multiple tables >>>> for training and validation separately). >>>>> >>>>> Please refer to FLIP-155 [1] for more details. >>>>> >>>>> Looking forward to your feedback! >>>>> >>>>> Regards, >>>>> Dian >>>>> >>>>> [1] >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-155%3A+Introduce+a+few+convenient+operations+in+Table+API >>>> >>>> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-155%3A+Introduce+a+few+convenient+operations+in+Table+API> >>>> >>>> >> >