Hi all, Thanks a lot for your feedback. If there are no more suggestions and comments, I think it's better to initiate a vote to create a FLIP for Apache Flink Python UDFs. What do you think?
Best, Jincheng jincheng sun <sunjincheng...@gmail.com> 于2019年8月15日周四 上午12:54写道: > Hi Thomas, > > Thanks for your confirmation and the very important reminder about bundle > processing. > > I have had add the description about how to perform bundle processing from > the perspective of checkpoint and watermark. Feel free to leave comments if > there are anything not describe clearly. > > Best, > Jincheng > > > Dian Fu <dian0511...@gmail.com> 于2019年8月14日周三 上午10:08写道: > >> Hi Thomas, >> >> Thanks a lot the suggestions. >> >> Regarding to bundle processing, there is a section "Checkpoint"[1] in the >> design doc which talks about how to handle the checkpoint. >> However, I think you are right that we should talk more about it, such as >> what's bundle processing, how it affects the checkpoint and watermark, how >> to handle the checkpoint and watermark, etc. >> >> [1] >> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3 >> < >> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3 >> > >> >> Regards, >> Dian >> >> > 在 2019年8月14日,上午1:01,Thomas Weise <t...@apache.org> 写道: >> > >> > Hi Jincheng, >> > >> > Thanks for putting this together. The proposal is very detailed, >> thorough >> > and for me as a Beam Flink runner contributor easy to understand :) >> > >> > One thing that you should probably detail more is the bundle >> processing. It >> > is critically important for performance that multiple elements are >> > processed in a bundle. The default bundle size in the Flink runner is >> 1s or >> > 1000 elements, whichever comes first. And for streaming, you can find >> the >> > logic necessary to align the bundle processing with watermarks and >> > checkpointing here: >> > >> https://github.com/apache/beam/blob/release-2.14.0/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java >> > >> > Thomas >> > >> > >> > >> > >> > >> > >> > >> > On Tue, Aug 13, 2019 at 7:05 AM jincheng sun <sunjincheng...@gmail.com> >> > wrote: >> > >> >> Hi all, >> >> >> >> The Python Table API(without Python UDF support) has already been >> supported >> >> and will be available in the coming release 1.9. >> >> As Python UDF is very important for Python users, we'd like to start >> the >> >> discussion about the Python UDF support in the Python Table API. >> >> Aljoscha Krettek, Dian Fu and I have discussed offline and have >> drafted a >> >> design doc[1]. It includes the following items: >> >> >> >> - The user-defined function interfaces. >> >> - The user-defined function execution architecture. >> >> >> >> As mentioned by many guys in the previous discussion thread[2], a >> >> portability framework was introduced in Apache Beam in latest >> releases. It >> >> provides well-defined, language-neutral data structures and protocols >> for >> >> language-neutral user-defined function execution. This design is based >> on >> >> Beam's portability framework. We will introduce how to make use of >> Beam's >> >> portability framework for user-defined function execution: data >> >> transmission, state access, checkpoint, metrics, logging, etc. >> >> >> >> Considering that the design relies on Beam's portability framework for >> >> Python user-defined function execution and not all the contributors in >> >> Flink community are familiar with Beam's portability framework, we have >> >> done a prototype[3] for proof of concept and also ease of >> understanding of >> >> the design. >> >> >> >> Welcome any feedback. >> >> >> >> Best, >> >> Jincheng >> >> >> >> [1] >> >> >> >> >> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit?usp=sharing >> >> [2] >> >> >> >> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-38-Support-python-language-in-flink-TableAPI-td28061.html >> >> [3] https://github.com/dianfu/flink/commits/udf_poc >> >> >> >>