Re: [DISCUSS] Flink Python User-Defined Function for Table API

Dian Fu Tue, 13 Aug 2019 19:08:32 -0700

Hi Thomas,

Thanks a lot the suggestions.


Regarding to bundle processing, there is a section "Checkpoint"[1] in the 
design doc which talks about how to handle the checkpoint.
However, I think you are right that we should talk more about it, such as 
what's bundle processing, how it affects the checkpoint and watermark, how to 
handle the checkpoint and watermark, etc.

[1] 
https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3
 
<https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3>

Regards,
Dian

> 在 2019年8月14日，上午1:01，Thomas Weise <t...@apache.org> 写道：
> 
> Hi Jincheng,
> 
> Thanks for putting this together. The proposal is very detailed, thorough
> and for me as a Beam Flink runner contributor easy to understand :)
> 
> One thing that you should probably detail more is the bundle processing. It
> is critically important for performance that multiple elements are
> processed in a bundle. The default bundle size in the Flink runner is 1s or
> 1000 elements, whichever comes first. And for streaming, you can find the
> logic necessary to align the bundle processing with watermarks and
> checkpointing here:
> https://github.com/apache/beam/blob/release-2.14.0/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
> 
> Thomas
> 
> 
> 
> 
> 
> 
> 
> On Tue, Aug 13, 2019 at 7:05 AM jincheng sun <sunjincheng...@gmail.com>
> wrote:
> 
>> Hi all,
>> 
>> The Python Table API(without Python UDF support) has already been supported
>> and will be available in the coming release 1.9.
>> As Python UDF is very important for Python users, we'd like to start the
>> discussion about the Python UDF support in the Python Table API.
>> Aljoscha Krettek, Dian Fu and I have discussed offline and have drafted a
>> design doc[1]. It includes the following items:
>> 
>> - The user-defined function interfaces.
>> - The user-defined function execution architecture.
>> 
>> As mentioned by many guys in the previous discussion thread[2], a
>> portability framework was introduced in Apache Beam in latest releases. It
>> provides well-defined, language-neutral data structures and protocols for
>> language-neutral user-defined function execution. This design is based on
>> Beam's portability framework. We will introduce how to make use of Beam's
>> portability framework for user-defined function execution: data
>> transmission, state access, checkpoint, metrics, logging, etc.
>> 
>> Considering that the design relies on Beam's portability framework for
>> Python user-defined function execution and not all the contributors in
>> Flink community are familiar with Beam's portability framework, we have
>> done a prototype[3] for proof of concept and also ease of understanding of
>> the design.
>> 
>> Welcome any feedback.
>> 
>> Best,
>> Jincheng
>> 
>> [1]
>> 
>> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit?usp=sharing
>> [2]
>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-38-Support-python-language-in-flink-TableAPI-td28061.html
>> [3] https://github.com/dianfu/flink/commits/udf_poc
>>

Re: [DISCUSS] Flink Python User-Defined Function for Table API

Reply via email to