Thanks for preparing the FLIP, xingbo!

LGTM overall and looking forward to the voting!

Regards,
Dian

> 在 2020年9月3日,下午5:22,jincheng sun <sunjincheng...@gmail.com> 写道:
> 
> Thank you! looking forward to the voting :)
> 
> Best,
> Jincheng
> 
> 
> Xingbo Huang <hxbks...@gmail.com> 于2020年9月3日周四 下午2:39写道:
> 
>> Hi Jincheng,
>> 
>> Yes, I agree that users can extend the class `AggregateFunction` if they
>> want to define a Pandas UDAF by the way of custom classes. I have updated
>> the part of the FLIP.
>> 
>> Best,
>> Xingbo
>> 
>> jincheng sun <sunjincheng...@gmail.com> 于2020年9月3日周四 下午1:48写道:
>> 
>>> Thanks for the update Xingbo!
>>> 
>>> Pandas UDAF can reuse the `class aggregate function (user defined
>>> function)` interface in FLIP-139, and the core logic of Pandas UDAF users
>>> is written in the `accumulate` method. In this way, we can unify the
>>> interface semantics of all UDAF.
>>> 
>>> What do you think?
>>> 
>>> Best,
>>> Jincheng
>>> 
>>> 
>>> 
>>> Xingbo Huang <hxbks...@gmail.com> 于2020年8月31日周一 下午6:06写道:
>>> 
>>>> Hi Jincheng,
>>>> 
>>>> Thanks a lot for joining the discussion and the suggestion of
>> discussing
>>>> FLIP-137 and FLIP-139 together.
>>>> 
>>>>>> 1. We also need to consider how pandas UDAF supports metrics, and
>>>> whether
>>>> we need a custom interface for pandas UDAF?
>>>> 
>>>> Yes. We need to add an interface so that users can add some logic in
>> the
>>>> `open` or `close` method such as creating metrics. I have added the
>>>> definition of the interface and the corresponding example in the doc.
>>>> 
>>>>>> 2. We have added @udaf(), so whether to use ordinary Python UDAF?
>>>> 
>>>> Yes. From the overall view of Python User Defined Function, we use @udf
>>> to
>>>> describe general python udf and pandas udf, @udtf to describe python
>>> udtf,
>>>> and @udaf to describe general python udaf and pandas udaf, which is
>> more
>>>> unified. I will discuss it in FLIP-139 later.
>>>> 
>>>> Best,
>>>> Xingbo
>>>> 
>>>> jincheng sun <sunjincheng...@gmail.com> 于2020年8月31日周一 上午11:05写道:
>>>> 
>>>>> Hi Xingbo,
>>>>> 
>>>>> Thanks for the discussion! Overall, + 1 for this FLIP.
>>>>> I have two points to add:
>>>>> 
>>>>> - We also need to consider how pandas UDAF supports metrics, and
>>> whether
>>>>> we need a custom interface for pandas UDAF?
>>>>> - We have added @udaf(), so whether to use ordinary Python UDAF? If
>>> not,
>>>>> the addition of @udaf is not appropriate. We need to discuss it
>>> further.
>>>>> 
>>>>> We can consider it combination with FLIP-139 for design. What do you
>>>> think?
>>>>> 
>>>>> Best,
>>>>> Jincheng
>>>>> 
>>>>> 
>>>>> Xingbo Huang <hxbks...@gmail.com> 于2020年8月24日周一 下午2:25写道:
>>>>> 
>>>>>> Hi everyone,
>>>>>> 
>>>>>> I would like to start a discussion thread on "Support Pandas UDAF
>> in
>>>>>> PyFlink"
>>>>>> 
>>>>>> Pandas UDF has been supported in FLINK 1.11 (FLIP-97[1]). It solves
>>> the
>>>>>> high serialization/deserialization overhead in Python UDF and makes
>>> it
>>>>>> convenient to leverage the popular Python libraries such as Pandas,
>>>>> Numpy,
>>>>>> etc. Since Pandas UDF has so many advantages, we want to support
>>> Pandas
>>>>>> UDAF to extend usage of Pandas UDF.
>>>>>> 
>>>>>> Dian Fu and I have discussed offline and have drafted the
>>> FLIP-137[2].
>>>> It
>>>>>> includes the following items:
>>>>>>  - Support Pandas UDAF in Batch Group Aggregation
>>>>>>  - Support Pandas UDAF in Batch Group Window Aggregation
>>>>>>  - Support Pandas UDAF in Batch Over Window Aggregation
>>>>>>  - Support Pandas UDAF in Stream Group Window Aggregation
>>>>>>  - Support Pandas UDAF in Stream Bounded Over Window Aggregation
>>>>>> 
>>>>>> 
>>>>>> Looking forward to your feedback!
>>>>>> 
>>>>>> Best,
>>>>>> Xingbo
>>>>>> 
>>>>>> [1]
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink
>>>>>> [2]
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Reply via email to