Hi everyone, Thanks all of you for the discussion. If there are no objections, I would like to start a vote thread tomorrow.
Best, Xingbo Dian Fu <dian0511...@gmail.com> 于2020年9月3日周四 下午5:45写道: > Thanks for preparing the FLIP, xingbo! > > LGTM overall and looking forward to the voting! > > Regards, > Dian > > > 在 2020年9月3日,下午5:22,jincheng sun <sunjincheng...@gmail.com> 写道: > > > > Thank you! looking forward to the voting :) > > > > Best, > > Jincheng > > > > > > Xingbo Huang <hxbks...@gmail.com> 于2020年9月3日周四 下午2:39写道: > > > >> Hi Jincheng, > >> > >> Yes, I agree that users can extend the class `AggregateFunction` if they > >> want to define a Pandas UDAF by the way of custom classes. I have > updated > >> the part of the FLIP. > >> > >> Best, > >> Xingbo > >> > >> jincheng sun <sunjincheng...@gmail.com> 于2020年9月3日周四 下午1:48写道: > >> > >>> Thanks for the update Xingbo! > >>> > >>> Pandas UDAF can reuse the `class aggregate function (user defined > >>> function)` interface in FLIP-139, and the core logic of Pandas UDAF > users > >>> is written in the `accumulate` method. In this way, we can unify the > >>> interface semantics of all UDAF. > >>> > >>> What do you think? > >>> > >>> Best, > >>> Jincheng > >>> > >>> > >>> > >>> Xingbo Huang <hxbks...@gmail.com> 于2020年8月31日周一 下午6:06写道: > >>> > >>>> Hi Jincheng, > >>>> > >>>> Thanks a lot for joining the discussion and the suggestion of > >> discussing > >>>> FLIP-137 and FLIP-139 together. > >>>> > >>>>>> 1. We also need to consider how pandas UDAF supports metrics, and > >>>> whether > >>>> we need a custom interface for pandas UDAF? > >>>> > >>>> Yes. We need to add an interface so that users can add some logic in > >> the > >>>> `open` or `close` method such as creating metrics. I have added the > >>>> definition of the interface and the corresponding example in the doc. > >>>> > >>>>>> 2. We have added @udaf(), so whether to use ordinary Python UDAF? > >>>> > >>>> Yes. From the overall view of Python User Defined Function, we use > @udf > >>> to > >>>> describe general python udf and pandas udf, @udtf to describe python > >>> udtf, > >>>> and @udaf to describe general python udaf and pandas udaf, which is > >> more > >>>> unified. I will discuss it in FLIP-139 later. > >>>> > >>>> Best, > >>>> Xingbo > >>>> > >>>> jincheng sun <sunjincheng...@gmail.com> 于2020年8月31日周一 上午11:05写道: > >>>> > >>>>> Hi Xingbo, > >>>>> > >>>>> Thanks for the discussion! Overall, + 1 for this FLIP. > >>>>> I have two points to add: > >>>>> > >>>>> - We also need to consider how pandas UDAF supports metrics, and > >>> whether > >>>>> we need a custom interface for pandas UDAF? > >>>>> - We have added @udaf(), so whether to use ordinary Python UDAF? If > >>> not, > >>>>> the addition of @udaf is not appropriate. We need to discuss it > >>> further. > >>>>> > >>>>> We can consider it combination with FLIP-139 for design. What do you > >>>> think? > >>>>> > >>>>> Best, > >>>>> Jincheng > >>>>> > >>>>> > >>>>> Xingbo Huang <hxbks...@gmail.com> 于2020年8月24日周一 下午2:25写道: > >>>>> > >>>>>> Hi everyone, > >>>>>> > >>>>>> I would like to start a discussion thread on "Support Pandas UDAF > >> in > >>>>>> PyFlink" > >>>>>> > >>>>>> Pandas UDF has been supported in FLINK 1.11 (FLIP-97[1]). It solves > >>> the > >>>>>> high serialization/deserialization overhead in Python UDF and makes > >>> it > >>>>>> convenient to leverage the popular Python libraries such as Pandas, > >>>>> Numpy, > >>>>>> etc. Since Pandas UDF has so many advantages, we want to support > >>> Pandas > >>>>>> UDAF to extend usage of Pandas UDF. > >>>>>> > >>>>>> Dian Fu and I have discussed offline and have drafted the > >>> FLIP-137[2]. > >>>> It > >>>>>> includes the following items: > >>>>>> - Support Pandas UDAF in Batch Group Aggregation > >>>>>> - Support Pandas UDAF in Batch Group Window Aggregation > >>>>>> - Support Pandas UDAF in Batch Over Window Aggregation > >>>>>> - Support Pandas UDAF in Stream Group Window Aggregation > >>>>>> - Support Pandas UDAF in Stream Bounded Over Window Aggregation > >>>>>> > >>>>>> > >>>>>> Looking forward to your feedback! > >>>>>> > >>>>>> Best, > >>>>>> Xingbo > >>>>>> > >>>>>> [1] > >>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink > >>>>>> [2] > >>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink > >>>>>> > >>>>> > >>>> > >>> > >> > >