Hi Xingbo, Thanks for the discussion! Overall, + 1 for this FLIP. I have two points to add:
- We also need to consider how pandas UDAF supports metrics, and whether we need a custom interface for pandas UDAF? - We have added @udaf(), so whether to use ordinary Python UDAF? If not, the addition of @udaf is not appropriate. We need to discuss it further. We can consider it combination with FLIP-139 for design. What do you think? Best, Jincheng Xingbo Huang <hxbks...@gmail.com> 于2020年8月24日周一 下午2:25写道: > Hi everyone, > > I would like to start a discussion thread on "Support Pandas UDAF in > PyFlink" > > Pandas UDF has been supported in FLINK 1.11 (FLIP-97[1]). It solves the > high serialization/deserialization overhead in Python UDF and makes it > convenient to leverage the popular Python libraries such as Pandas, Numpy, > etc. Since Pandas UDF has so many advantages, we want to support Pandas > UDAF to extend usage of Pandas UDF. > > Dian Fu and I have discussed offline and have drafted the FLIP-137[2]. It > includes the following items: > - Support Pandas UDAF in Batch Group Aggregation > - Support Pandas UDAF in Batch Group Window Aggregation > - Support Pandas UDAF in Batch Over Window Aggregation > - Support Pandas UDAF in Stream Group Window Aggregation > - Support Pandas UDAF in Stream Bounded Over Window Aggregation > > > Looking forward to your feedback! > > Best, > Xingbo > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink > [2] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink >