Hi Xingbo,

Thanks for the discussion! Overall, + 1 for this FLIP.
I have two points to add:

 - We also need to consider how pandas UDAF supports metrics, and whether
we need a custom interface for pandas UDAF?
 - We have added @udaf(), so whether to use ordinary Python UDAF? If not,
the addition of @udaf is not appropriate. We need to discuss it further.

We can consider it combination with FLIP-139 for design. What do you think?

Best,
Jincheng


Xingbo Huang <hxbks...@gmail.com> 于2020年8月24日周一 下午2:25写道:

> Hi everyone,
>
> I would like to start a discussion thread on "Support Pandas UDAF in
> PyFlink"
>
> Pandas UDF has been supported in FLINK 1.11 (FLIP-97[1]). It solves the
> high serialization/deserialization overhead in Python UDF and makes it
> convenient to leverage the popular Python libraries such as Pandas, Numpy,
> etc. Since Pandas UDF has so many advantages, we want to support Pandas
> UDAF to extend usage of Pandas UDF.
>
> Dian Fu and I have discussed offline and have drafted the FLIP-137[2]. It
> includes the following items:
>   - Support Pandas UDAF in Batch Group Aggregation
>   - Support Pandas UDAF in Batch Group Window Aggregation
>   - Support Pandas UDAF in Batch Over Window Aggregation
>   - Support Pandas UDAF in Stream Group Window Aggregation
>   - Support Pandas UDAF in Stream Bounded Over Window Aggregation
>
>
> Looking forward to your feedback!
>
> Best,
> Xingbo
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink
>

Reply via email to