Thanks for preparing the FLIP, xingbo! LGTM overall and looking forward to the voting!
Regards, Dian > 在 2020年9月3日,下午5:22,jincheng sun <sunjincheng...@gmail.com> 写道: > > Thank you! looking forward to the voting :) > > Best, > Jincheng > > > Xingbo Huang <hxbks...@gmail.com> 于2020年9月3日周四 下午2:39写道: > >> Hi Jincheng, >> >> Yes, I agree that users can extend the class `AggregateFunction` if they >> want to define a Pandas UDAF by the way of custom classes. I have updated >> the part of the FLIP. >> >> Best, >> Xingbo >> >> jincheng sun <sunjincheng...@gmail.com> 于2020年9月3日周四 下午1:48写道: >> >>> Thanks for the update Xingbo! >>> >>> Pandas UDAF can reuse the `class aggregate function (user defined >>> function)` interface in FLIP-139, and the core logic of Pandas UDAF users >>> is written in the `accumulate` method. In this way, we can unify the >>> interface semantics of all UDAF. >>> >>> What do you think? >>> >>> Best, >>> Jincheng >>> >>> >>> >>> Xingbo Huang <hxbks...@gmail.com> 于2020年8月31日周一 下午6:06写道: >>> >>>> Hi Jincheng, >>>> >>>> Thanks a lot for joining the discussion and the suggestion of >> discussing >>>> FLIP-137 and FLIP-139 together. >>>> >>>>>> 1. We also need to consider how pandas UDAF supports metrics, and >>>> whether >>>> we need a custom interface for pandas UDAF? >>>> >>>> Yes. We need to add an interface so that users can add some logic in >> the >>>> `open` or `close` method such as creating metrics. I have added the >>>> definition of the interface and the corresponding example in the doc. >>>> >>>>>> 2. We have added @udaf(), so whether to use ordinary Python UDAF? >>>> >>>> Yes. From the overall view of Python User Defined Function, we use @udf >>> to >>>> describe general python udf and pandas udf, @udtf to describe python >>> udtf, >>>> and @udaf to describe general python udaf and pandas udaf, which is >> more >>>> unified. I will discuss it in FLIP-139 later. >>>> >>>> Best, >>>> Xingbo >>>> >>>> jincheng sun <sunjincheng...@gmail.com> 于2020年8月31日周一 上午11:05写道: >>>> >>>>> Hi Xingbo, >>>>> >>>>> Thanks for the discussion! Overall, + 1 for this FLIP. >>>>> I have two points to add: >>>>> >>>>> - We also need to consider how pandas UDAF supports metrics, and >>> whether >>>>> we need a custom interface for pandas UDAF? >>>>> - We have added @udaf(), so whether to use ordinary Python UDAF? If >>> not, >>>>> the addition of @udaf is not appropriate. We need to discuss it >>> further. >>>>> >>>>> We can consider it combination with FLIP-139 for design. What do you >>>> think? >>>>> >>>>> Best, >>>>> Jincheng >>>>> >>>>> >>>>> Xingbo Huang <hxbks...@gmail.com> 于2020年8月24日周一 下午2:25写道: >>>>> >>>>>> Hi everyone, >>>>>> >>>>>> I would like to start a discussion thread on "Support Pandas UDAF >> in >>>>>> PyFlink" >>>>>> >>>>>> Pandas UDF has been supported in FLINK 1.11 (FLIP-97[1]). It solves >>> the >>>>>> high serialization/deserialization overhead in Python UDF and makes >>> it >>>>>> convenient to leverage the popular Python libraries such as Pandas, >>>>> Numpy, >>>>>> etc. Since Pandas UDF has so many advantages, we want to support >>> Pandas >>>>>> UDAF to extend usage of Pandas UDF. >>>>>> >>>>>> Dian Fu and I have discussed offline and have drafted the >>> FLIP-137[2]. >>>> It >>>>>> includes the following items: >>>>>> - Support Pandas UDAF in Batch Group Aggregation >>>>>> - Support Pandas UDAF in Batch Group Window Aggregation >>>>>> - Support Pandas UDAF in Batch Over Window Aggregation >>>>>> - Support Pandas UDAF in Stream Group Window Aggregation >>>>>> - Support Pandas UDAF in Stream Bounded Over Window Aggregation >>>>>> >>>>>> >>>>>> Looking forward to your feedback! >>>>>> >>>>>> Best, >>>>>> Xingbo >>>>>> >>>>>> [1] >>>>>> >>>>>> >>>>> >>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink >>>>>> [2] >>>>>> >>>>>> >>>>> >>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink >>>>>> >>>>> >>>> >>> >>