Hi Shaoxuan, Thanks for reminding that. I think "Flink Python User-Defined Function for Table" make sense to me.
Best, Jincheng Timo Walther <twal...@apache.org> 于2019年9月2日周一 下午5:04写道: > Hi all, > > the FLIP looks awesome. However, I would like to discuss the changes to > the user-facing parts again. Some feedback: > > 1. DataViews: With the current non-annotation design for DataViews, we > cannot perform eager state declaration, right? At which point during > execution do we know which state is required by the function? We need to > instantiate the function first, right? > > 2. Serializability of functions: How do we ensure serializability of > functions for catalog persistence? In the Scala/Java API, we would like > to register classes instead of instances soon. This is the only way to > store a function properly in a catalog or we need some > serialization/deserialization logic in the function interfaces to > convert an instance to string properties. > > 3. TableEnvironment: What is the signature of `register_function(self, > name, function)`? Does it accept both a class and function? Like `class > Sum` and `def split()`? Could you add some examples for registering both > kinds of functions? > > 4. FunctionDefinition: Function definition is not a user-defined > function definition. It is the highest interface for both user-defined > and built-in functions. I'm not sure if getLanguage() should be part of > this interface or one-level down which would be `UserDefinedFunction`. > Built-in functions will never be implemented in a different language. In > any case, I would vote for removing the UNKNOWN language, because it > does not solve anything. Why should a user declare a function that the > runtime can not handle? I also find the term `JAVA` confusing for Scala > users. How about `FunctionLanguage.JVM` instead? > > 5. Function characteristics: In the current design, function classes do > not extend from any upper class. How can users declare characteristics > that are present in `FunctionDefinition` like determinism, requirements, > or soon also monotonism. > > Thanks, > Timo > > > On 02.09.19 03:38, Shaoxuan Wang wrote: > > Hi Jincheng, Fudian, and Aljoscha, > > I am assuming the proposed python UDX can also be applied to Flink SQL. > > Is this correct? If yes, I would suggest to title the FLIP as "Flink > Python > > User-Defined Function" or "Flink Python User-Defined Function for Table". > > > > Regards, > > Shaoxuan > > > > > > On Wed, Aug 28, 2019 at 12:22 PM jincheng sun <sunjincheng...@gmail.com> > > wrote: > > > >> Thanks for the feedback Bowen! > >> > >> Great thanks for create the FLIP and bring up the VOTE Dian! > >> > >> Best, Jincheng > >> > >> Dian Fu <dian0511...@gmail.com> 于2019年8月28日周三 上午11:32写道: > >> > >>> Hi all, > >>> > >>> I have started a voting thread [1]. Thanks a lot for your help during > >>> creating the FLIP @Jincheng. > >>> > >>> > >>> Hi Bowen, > >>> > >>> Very appreciated for your comments. I have replied you in the design > doc. > >>> As it seems that the comments doesn't affect the overall design, I'll > not > >>> cancel the vote for now and we can continue the discussion in the > design > >>> doc. > >>> > >>> [1] > >>> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-58-Flink-Python-User-Defined-Function-for-Table-API-td32295.html > >>> < > >>> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-58-Flink-Python-User-Defined-Function-for-Table-API-td32295.html > >>> Regards, > >>> Dian > >>> > >>>> 在 2019年8月28日,上午11:05,Bowen Li <bowenl...@gmail.com> 写道: > >>>> > >>>> Hi Jincheng and Dian, > >>>> > >>>> Sorry for being late to the party. I took a glance at the proposal, > >> LGTM > >>> in > >>>> general, and I left only a couple comments. > >>>> > >>>> Thanks, > >>>> Bowen > >>>> > >>>> > >>>> On Mon, Aug 26, 2019 at 8:05 PM Dian Fu <dian0511...@gmail.com> > wrote: > >>>> > >>>>> Hi Jincheng, > >>>>> > >>>>> Thanks! It works. > >>>>> > >>>>> Thanks, > >>>>> Dian > >>>>> > >>>>>> 在 2019年8月27日,上午10:55,jincheng sun <sunjincheng...@gmail.com> 写道: > >>>>>> > >>>>>> Hi Dian, can you check if you have edit access? :) > >>>>>> > >>>>>> > >>>>>> Dian Fu <dian0511...@gmail.com> 于2019年8月26日周一 上午10:52写道: > >>>>>> > >>>>>>> Hi Jincheng, > >>>>>>> > >>>>>>> Appreciated for the kind tips and offering of help. Definitely need > >>> it! > >>>>>>> Could you grant me write permission for confluence? My Id: Dian Fu > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Dian > >>>>>>> > >>>>>>>> 在 2019年8月26日,上午9:53,jincheng sun <sunjincheng...@gmail.com> 写道: > >>>>>>>> > >>>>>>>> Thanks for your feedback Hequn & Dian. > >>>>>>>> > >>>>>>>> Dian, I am glad to see that you want help to create the FLIP! > >>>>>>>> Everyone will have first time, and I am very willing to help you > >>>>> complete > >>>>>>>> your first FLIP creation. Here some tips: > >>>>>>>> > >>>>>>>> - First I'll give your account write permission for confluence. > >>>>>>>> - Before create the FLIP, please have look at the FLIP Template > >> [1], > >>>>>>> (It's > >>>>>>>> better to know more about FLIP by reading [2]) > >>>>>>>> - Create Flink Python UDFs related JIRAs after completing the VOTE > >> of > >>>>>>>> FLIP.(I think you also can bring up the VOTE thread, if you want! > ) > >>>>>>>> > >>>>>>>> Any problems you encounter during this period,feel free to tell me > >>> that > >>>>>>> we > >>>>>>>> can solve them together. :) > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Jincheng > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> [1] > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template > >>>>>>>> [2] > >>>>>>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > >>>>>>>> > >>>>>>>> Hequn Cheng <chenghe...@gmail.com> 于2019年8月23日周五 上午11:54写道: > >>>>>>>> > >>>>>>>>> +1 for starting the vote. > >>>>>>>>> > >>>>>>>>> Thanks Jincheng a lot for the discussion. > >>>>>>>>> > >>>>>>>>> Best, Hequn > >>>>>>>>> > >>>>>>>>> On Fri, Aug 23, 2019 at 10:06 AM Dian Fu <dian0511...@gmail.com> > >>>>> wrote: > >>>>>>>>>> Hi Jincheng, > >>>>>>>>>> > >>>>>>>>>> +1 to start the FLIP create and VOTE on this feature. I'm > willing > >>> to > >>>>>>> help > >>>>>>>>>> on the FLIP create if you don't mind. As I haven't created a > FLIP > >>>>>>> before, > >>>>>>>>>> it will be great if you could help on this. :) > >>>>>>>>>> > >>>>>>>>>> Regards, > >>>>>>>>>> Dian > >>>>>>>>>> > >>>>>>>>>>> 在 2019年8月22日,下午11:41,jincheng sun <sunjincheng...@gmail.com> > >> 写道: > >>>>>>>>>>> Hi all, > >>>>>>>>>>> > >>>>>>>>>>> Thanks a lot for your feedback. If there are no more > suggestions > >>> and > >>>>>>>>>>> comments, I think it's better to initiate a vote to create a > >> FLIP > >>>>> for > >>>>>>>>>>> Apache Flink Python UDFs. > >>>>>>>>>>> What do you think? > >>>>>>>>>>> > >>>>>>>>>>> Best, Jincheng > >>>>>>>>>>> > >>>>>>>>>>> jincheng sun <sunjincheng...@gmail.com> 于2019年8月15日周四 > >> 上午12:54写道: > >>>>>>>>>>>> Hi Thomas, > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks for your confirmation and the very important reminder > >>> about > >>>>>>>>>> bundle > >>>>>>>>>>>> processing. > >>>>>>>>>>>> > >>>>>>>>>>>> I have had add the description about how to perform bundle > >>>>> processing > >>>>>>>>>> from > >>>>>>>>>>>> the perspective of checkpoint and watermark. Feel free to > leave > >>>>>>>>>> comments if > >>>>>>>>>>>> there are anything not describe clearly. > >>>>>>>>>>>> > >>>>>>>>>>>> Best, > >>>>>>>>>>>> Jincheng > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年8月14日周三 上午10:08写道: > >>>>>>>>>>>> > >>>>>>>>>>>>> Hi Thomas, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks a lot the suggestions. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Regarding to bundle processing, there is a section > >>> "Checkpoint"[1] > >>>>>>> in > >>>>>>>>>> the > >>>>>>>>>>>>> design doc which talks about how to handle the checkpoint. > >>>>>>>>>>>>> However, I think you are right that we should talk more about > >>> it, > >>>>>>>>> such > >>>>>>>>>> as > >>>>>>>>>>>>> what's bundle processing, how it affects the checkpoint and > >>>>>>>>> watermark, > >>>>>>>>>> how > >>>>>>>>>>>>> to handle the checkpoint and watermark, etc. > >>>>>>>>>>>>> > >>>>>>>>>>>>> [1] > >>>>>>>>>>>>> > >> > https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3 > >>>>>>>>>>>>> < > >>>>>>>>>>>>> > >> > https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3 > >>>>>>>>>>>>> Regards, > >>>>>>>>>>>>> Dian > >>>>>>>>>>>>> > >>>>>>>>>>>>>> 在 2019年8月14日,上午1:01,Thomas Weise <t...@apache.org> 写道: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi Jincheng, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks for putting this together. The proposal is very > >>> detailed, > >>>>>>>>>>>>> thorough > >>>>>>>>>>>>>> and for me as a Beam Flink runner contributor easy to > >>> understand > >>>>> :) > >>>>>>>>>>>>>> One thing that you should probably detail more is the bundle > >>>>>>>>>>>>> processing. It > >>>>>>>>>>>>>> is critically important for performance that multiple > >> elements > >>>>> are > >>>>>>>>>>>>>> processed in a bundle. The default bundle size in the Flink > >>>>> runner > >>>>>>>>> is > >>>>>>>>>>>>> 1s or > >>>>>>>>>>>>>> 1000 elements, whichever comes first. And for streaming, you > >>> can > >>>>>>>>> find > >>>>>>>>>>>>> the > >>>>>>>>>>>>>> logic necessary to align the bundle processing with > >> watermarks > >>>>> and > >>>>>>>>>>>>>> checkpointing here: > >>>>>>>>>>>>>> > >> > https://github.com/apache/beam/blob/release-2.14.0/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java > >>>>>>>>>>>>>> Thomas > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Tue, Aug 13, 2019 at 7:05 AM jincheng sun < > >>>>>>>>>> sunjincheng...@gmail.com> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi all, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> The Python Table API(without Python UDF support) has > already > >>>>> been > >>>>>>>>>>>>> supported > >>>>>>>>>>>>>>> and will be available in the coming release 1.9. > >>>>>>>>>>>>>>> As Python UDF is very important for Python users, we'd like > >> to > >>>>>>>>> start > >>>>>>>>>>>>> the > >>>>>>>>>>>>>>> discussion about the Python UDF support in the Python Table > >>> API. > >>>>>>>>>>>>>>> Aljoscha Krettek, Dian Fu and I have discussed offline and > >>> have > >>>>>>>>>>>>> drafted a > >>>>>>>>>>>>>>> design doc[1]. It includes the following items: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> - The user-defined function interfaces. > >>>>>>>>>>>>>>> - The user-defined function execution architecture. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> As mentioned by many guys in the previous discussion > >>> thread[2], > >>>>> a > >>>>>>>>>>>>>>> portability framework was introduced in Apache Beam in > >> latest > >>>>>>>>>>>>> releases. It > >>>>>>>>>>>>>>> provides well-defined, language-neutral data structures and > >>>>>>>>> protocols > >>>>>>>>>>>>> for > >>>>>>>>>>>>>>> language-neutral user-defined function execution. This > >> design > >>> is > >>>>>>>>>> based > >>>>>>>>>>>>> on > >>>>>>>>>>>>>>> Beam's portability framework. We will introduce how to make > >>> use > >>>>> of > >>>>>>>>>>>>> Beam's > >>>>>>>>>>>>>>> portability framework for user-defined function execution: > >>> data > >>>>>>>>>>>>>>> transmission, state access, checkpoint, metrics, logging, > >> etc. > >>>>>>>>>>>>>>> Considering that the design relies on Beam's portability > >>>>> framework > >>>>>>>>>> for > >>>>>>>>>>>>>>> Python user-defined function execution and not all the > >>>>>>> contributors > >>>>>>>>>> in > >>>>>>>>>>>>>>> Flink community are familiar with Beam's portability > >>> framework, > >>>>> we > >>>>>>>>>> have > >>>>>>>>>>>>>>> done a prototype[3] for proof of concept and also ease of > >>>>>>>>>>>>> understanding of > >>>>>>>>>>>>>>> the design. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Welcome any feedback. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>> Jincheng > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >> > https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit?usp=sharing > >>>>>>>>>>>>>>> [2] > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-38-Support-python-language-in-flink-TableAPI-td28061.html > >>>>>>>>>>>>>>> [3] https://github.com/dianfu/flink/commits/udf_poc > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>> > >>>>> > >>> > >