Hi Jincheng, Yes, I agree that users can extend the class `AggregateFunction` if they want to define a Pandas UDAF by the way of custom classes. I have updated the part of the FLIP.
Best, Xingbo jincheng sun <sunjincheng...@gmail.com> 于2020年9月3日周四 下午1:48写道: > Thanks for the update Xingbo! > > Pandas UDAF can reuse the `class aggregate function (user defined > function)` interface in FLIP-139, and the core logic of Pandas UDAF users > is written in the `accumulate` method. In this way, we can unify the > interface semantics of all UDAF. > > What do you think? > > Best, > Jincheng > > > > Xingbo Huang <hxbks...@gmail.com> 于2020年8月31日周一 下午6:06写道: > > > Hi Jincheng, > > > > Thanks a lot for joining the discussion and the suggestion of discussing > > FLIP-137 and FLIP-139 together. > > > > >> 1. We also need to consider how pandas UDAF supports metrics, and > > whether > > we need a custom interface for pandas UDAF? > > > > Yes. We need to add an interface so that users can add some logic in the > > `open` or `close` method such as creating metrics. I have added the > > definition of the interface and the corresponding example in the doc. > > > > >> 2. We have added @udaf(), so whether to use ordinary Python UDAF? > > > > Yes. From the overall view of Python User Defined Function, we use @udf > to > > describe general python udf and pandas udf, @udtf to describe python > udtf, > > and @udaf to describe general python udaf and pandas udaf, which is more > > unified. I will discuss it in FLIP-139 later. > > > > Best, > > Xingbo > > > > jincheng sun <sunjincheng...@gmail.com> 于2020年8月31日周一 上午11:05写道: > > > > > Hi Xingbo, > > > > > > Thanks for the discussion! Overall, + 1 for this FLIP. > > > I have two points to add: > > > > > > - We also need to consider how pandas UDAF supports metrics, and > whether > > > we need a custom interface for pandas UDAF? > > > - We have added @udaf(), so whether to use ordinary Python UDAF? If > not, > > > the addition of @udaf is not appropriate. We need to discuss it > further. > > > > > > We can consider it combination with FLIP-139 for design. What do you > > think? > > > > > > Best, > > > Jincheng > > > > > > > > > Xingbo Huang <hxbks...@gmail.com> 于2020年8月24日周一 下午2:25写道: > > > > > > > Hi everyone, > > > > > > > > I would like to start a discussion thread on "Support Pandas UDAF in > > > > PyFlink" > > > > > > > > Pandas UDF has been supported in FLINK 1.11 (FLIP-97[1]). It solves > the > > > > high serialization/deserialization overhead in Python UDF and makes > it > > > > convenient to leverage the popular Python libraries such as Pandas, > > > Numpy, > > > > etc. Since Pandas UDF has so many advantages, we want to support > Pandas > > > > UDAF to extend usage of Pandas UDF. > > > > > > > > Dian Fu and I have discussed offline and have drafted the > FLIP-137[2]. > > It > > > > includes the following items: > > > > - Support Pandas UDAF in Batch Group Aggregation > > > > - Support Pandas UDAF in Batch Group Window Aggregation > > > > - Support Pandas UDAF in Batch Over Window Aggregation > > > > - Support Pandas UDAF in Stream Group Window Aggregation > > > > - Support Pandas UDAF in Stream Bounded Over Window Aggregation > > > > > > > > > > > > Looking forward to your feedback! > > > > > > > > Best, > > > > Xingbo > > > > > > > > [1] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink > > > > [2] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink > > > > > > > > > >