Hi, Thanks for the quick response! I think this looks good now and it should be something that everyone can agree on as a first step.
Best, Aljoscha > On 6. Sep 2019, at 12:22, Dian Fu <dian0511...@gmail.com> wrote: > > Hi all, > > I have updated the FLIP and removed content relate to UDAF and also changed > the title of the FLIP to "Flink Python User-Defined Stateless Function for > Table". Does it make sense to you? > > Regards, > Dian > >> 在 2019年9月6日,下午6:09,Dian Fu <dian0511...@gmail.com> 写道: >> >> Hi all, >> >> Thanks a lot for the discussion here. It makes sense to limit the scope of >> this FLIP to only ScalarFunction. I'll update the FLIP and remove the >> content relating to UDAF. >> >> Thanks, >> Dian >> >>> 在 2019年9月6日,下午6:02,jincheng sun <sunjincheng...@gmail.com> 写道: >>> >>> Hi, >>> >>> Sure, for ensure the 1.10 relesae of flink, let's split the FLIPs, and >>> FLIP-58 only do the stateless part. >>> >>> Cheers, >>> Jincheng >>> >>> Aljoscha Krettek <aljos...@apache.org> 于2019年9月6日周五 下午5:53写道: >>> >>>> Hi, >>>> >>>> Regarding stateful functions and MapView/DataView/ListView: I think it’s >>>> best to keep that for a later FLIP and focus on a more basic version. >>>> Supporting stateful functions, especially with MapView can potentially be >>>> very slow so we have to see what we can do there. >>>> >>>> For the method names, I don’t know. If FLIP-64 passes they have to be >>>> changed. So we could use the final names right away, but I’m also fine with >>>> using the old method names for now. >>>> >>>> Best, >>>> Aljoscha >>>> >>>>> On 5. Sep 2019, at 12:40, jincheng sun <sunjincheng...@gmail.com> wrote: >>>>> >>>>> Hi Aljoscha, >>>>> >>>>> Thanks for your comments! >>>>> >>>>> Regarding to the FLIP scope, it seems that we have agreed on the design >>>> of >>>>> the stateless function support. >>>>> What do you think about starting the development of the stateless >>>> function >>>>> support firstly and continue the discussion of stateful function support? >>>>> Or you think we should split the current FLIP into two FLIPs and discuss >>>>> the stateful function support in another thread? >>>>> >>>>> Currently, the Python DataView/MapView/ListView interfaces design follow >>>>> the Java/Scala naming conversions. >>>>> Of couse, We can continue to discuss whether there are better solutions, >>>>> i.e. using annotations. >>>>> >>>>> Regarding to the magic logic to support DataView/MapView/ListView, it >>>> will >>>>> be done by the framework and is transparent for users. >>>>> Per my understanding, the magic logic is unavoidable no matter what the >>>>> interfaces will be. >>>>> >>>>> Regarding to the catalog support of python function:1) If it's stored in >>>>> memory as temporary object, just as you said, users can call >>>>> TableEnvironment.register_function(will change to >>>>> register_temporary_function in FLIP-64) >>>>> 2) If it's persisted in external storage, users can call >>>>> Catalog.create_function. There will be no API change per my >>>> understanding. >>>>> >>>>> What do you think? >>>>> Best,Jincheng >>>>> >>>>> Aljoscha Krettek <aljos...@apache.org> 于2019年9月5日周四 下午5:32写道: >>>>> >>>>>> Hi, >>>>>> >>>>>> Another thing to consider is the Scope of the FLIP. Currently, we try to >>>>>> support (stateful) AggregateFunctions. I have some concerns about >>>> whether >>>>>> or not DataView/MapView/ListView is a good interface because it requires >>>>>> quite some magic from the runners to make it work, such as messing with >>>> the >>>>>> TypeInformation and injecting objects at runtime. If the FLIP aims for >>>> the >>>>>> minimum of ScalarFunctions and the whole execution harness, that should >>>> be >>>>>> easier to agree on. >>>>>> >>>>>> Another point is the naming of the new methods. I think Timo hinted at >>>> the >>>>>> fact that we have to consider catalog support for functions. There is >>>>>> ongoing work about differentiating between temporary objects and objects >>>>>> that are stored in a catalog (FLIP-64 [1]). With this in mind, the >>>> method >>>>>> for registering functions should be called register_temporary_function() >>>>>> and so on. Unless we want to already think about mixing Python and Java >>>>>> functions in the catalog, which is outside the scope of this FLIP, I >>>> think. >>>>>> >>>>>> Best, >>>>>> Aljoscha >>>>>> >>>>>> [1] >>>>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module >>>>>> >>>>>> >>>>>>> On 5. Sep 2019, at 05:01, jincheng sun <sunjincheng...@gmail.com> >>>> wrote: >>>>>>> >>>>>>> Hi Aljoscha, >>>>>>> >>>>>>> That's a good points, so far, most of the code will live in >>>> flink-python >>>>>>> module, and the rules and relNodes will be put into the both blink and >>>>>>> flink planner modules, some of the common interface of required by >>>>>> planners >>>>>>> will be placed in flink-table-common. I think you are right, we should >>>>>> try >>>>>>> to ensure the changes of this feature is minimal. For more detail we >>>>>> would >>>>>>> follow this principle when review the PRs. >>>>>>> >>>>>>> Great thanks for your questions and remind! >>>>>>> >>>>>>> Best, >>>>>>> Jincheng >>>>>>> >>>>>>> >>>>>>> Aljoscha Krettek <aljos...@apache.org> 于2019年9月4日周三 下午8:58写道: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Things looks interesting so far! >>>>>>>> >>>>>>>> I had one question: Where will most of the support code for this live? >>>>>>>> Will this add the required code to flink-table-common or the different >>>>>>>> runners? Can we implement this in such a way that only a minimal >>>> amount >>>>>> of >>>>>>>> support code is required in the parts of the Table API (and Table API >>>>>>>> runners) that are not python specific? >>>>>>>> >>>>>>>> Best, >>>>>>>> Aljoscha >>>>>>>> >>>>>>>>> On 4. Sep 2019, at 14:14, Timo Walther <twal...@apache.org> wrote: >>>>>>>>> >>>>>>>>> Hi Jincheng, >>>>>>>>> >>>>>>>>> 2. Serializability of functions: "#2 is very convenient for users" >>>>>> means >>>>>>>> only until they have the first backwards-compatibility issue, after >>>> that >>>>>>>> they will find it not so convinient anymore and will ask why the >>>>>> framework >>>>>>>> allowed storing such objects in a persistent storage. I don't want to >>>> be >>>>>>>> picky about it, but wanted to raise awareness that sometimes it is ok >>>> to >>>>>>>> limit use cases to guide users for devloping backwards-compatible >>>>>> programs. >>>>>>>>> >>>>>>>>> Thanks for the explanation fo the remaining items. It sounds >>>> reasonable >>>>>>>> to me. Regarding the example with `getKind()`, I actually meant >>>>>>>> `org.apache.flink.table.functions.ScalarFunction#getKind` we don't >>>> allow >>>>>>>> users to override this property. And I think we should do something >>>>>> similar >>>>>>>> for the getLanguage property. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Timo >>>>>>>>> >>>>>>>>> On 03.09.19 15:01, jincheng sun wrote: >>>>>>>>>> Hi Timo, >>>>>>>>>> >>>>>>>>>> Thanks for the quick reply ! :) >>>>>>>>>> I have added more example for #3 and #5 to the FLIP. That are great >>>>>>>>>> suggestions ! >>>>>>>>>> >>>>>>>>>> Regarding 2: >>>>>>>>>> >>>>>>>>>> There are two kind Serialization for CloudPickle(Which is different >>>>>> from >>>>>>>>>> Java): >>>>>>>>>> 1) For class and function which can be imported, CloudPickle only >>>>>>>>>> serialize the full path of the class and function (just like java >>>>>> class >>>>>>>>>> name). >>>>>>>>>> 2) For the class and function which can not be imported, CloudPickle >>>>>>>> will >>>>>>>>>> serialize the full content of the class and function. >>>>>>>>>> For #2, It means that we can not just store the full path of the >>>> class >>>>>>>> and >>>>>>>>>> function. >>>>>>>>>> >>>>>>>>>> The above serialization is recursive. >>>>>>>>>> >>>>>>>>>> However, there is indeed an problem of backwards compatibility when >>>>>> the >>>>>>>>>> module path of the parent class changed. But I think this is an rare >>>>>>>> case >>>>>>>>>> and acceptable. i.e., For Flink framework we never change the user >>>>>>>>>> interface module path if we want to keep backwards compatibility. >>>> For >>>>>>>> user >>>>>>>>>> code, if they change the interface of UDF's parent, they should >>>>>>>> re-register >>>>>>>>>> their functions. >>>>>>>>>> >>>>>>>>>> If we do not want support #2, we can store the full path of class >>>> and >>>>>>>>>> function, in that case we have no backwards compatibility problem. >>>>>> But I >>>>>>>>>> think the #2 is very convenient for users. >>>>>>>>>> >>>>>>>>>> What do you think? >>>>>>>>>> >>>>>>>>>> Regarding 4: >>>>>>>>>> As I mentioned earlier, there may be built-in Python functions and I >>>>>>>> think >>>>>>>>>> language is a "function" concept. Function and Language are >>>> orthogonal >>>>>>>>>> concepts. >>>>>>>>>> We may have R, GO and other language functions in the future, not >>>> only >>>>>>>>>> user-defined, but also built-in functions. >>>>>>>>>> >>>>>>>>>> You are right that users will not set this method and for Python >>>>>>>> functions, >>>>>>>>>> it will be set in the code-generated Java function by the framework. >>>>>>>> So, I >>>>>>>>>> think we should declare the getLanguage() in FunctionDefinition for >>>>>> now. >>>>>>>>>> (I'm not pretty sure what do you mean by saying that getKind() is >>>>>> final >>>>>>>> in >>>>>>>>>> UserDefinedFunction?) >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Jincheng >>>>>>>>>> >>>>>>>>>> Timo Walther <twal...@apache.org> 于2019年9月3日周二 下午6:01写道: >>>>>>>>>> >>>>>>>>>>> Hi Jincheng, >>>>>>>>>>> >>>>>>>>>>> thanks for your response. >>>>>>>>>>> >>>>>>>>>>> 2. Serializability of functions: Using some arbitrary serialization >>>>>>>>>>> format for shipping a function to worker sounds fine to me. But >>>> once >>>>>> we >>>>>>>>>>> store functions a the catalog we need to think about backwards >>>>>>>>>>> compatibility and evolution of interfaces etc. I'm not sure if >>>>>>>>>>> CloudPickle is the right long-term storage format for this. If we >>>>>> don't >>>>>>>>>>> think about this in advance, we are basically violating our code >>>>>>>> quality >>>>>>>>>>> guide [1] of never use Java Serialization but in the Python-way. We >>>>>> are >>>>>>>>>>> using the RPC serialization for persistence. >>>>>>>>>>> >>>>>>>>>>> 3. TableEnvironment: Can you add some example to the FLIP? Because >>>>>> API >>>>>>>>>>> code like the following is not covered there: >>>>>>>>>>> >>>>>>>>>>> self.t_env.register_function("add_one", udf(lambda i: i + 1, >>>>>>>>>>> DataTypes.BIGINT(), >>>>>>>>>>> DataTypes.BIGINT())) >>>>>>>>>>> self.t_env.register_function("subtract_one", udf(SubtractOne(), >>>>>>>>>>> DataTypes.BIGINT(), >>>>>>>>>>> DataTypes.BIGINT())) >>>>>>>>>>> self.t_env.register_function("add", add) >>>>>>>>>>> >>>>>>>>>>> 4. FunctionDefinition: Your response still doesn't answer my >>>> question >>>>>>>>>>> entirely. Why do we need FunctionDefinition.getLanguage() if this >>>> is >>>>>> a >>>>>>>>>>> "user-defined function" concept and not a "function" concept. In >>>> any >>>>>>>>>>> case, all users should not be able to set this method. So it must >>>> be >>>>>>>>>>> final in UserDefinedFunction similar to getKind(). >>>>>>>>>>> >>>>>>>>>>> 5. Function characteristics: If UserDefinedFunction is defined in >>>>>>>>>>> Python, why is it not used in your example in FLIP-58. You could >>>> you >>>>>>>>>>> extend the example to show how to specify these attributes in the >>>>>> FLIP? >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Timo >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>> >>>> https://flink.apache.org/contributing/code-style-and-quality-java.html >>>>>>>>>>> >>>>>>>>>>> On 02.09.19 15:35, jincheng sun wrote: >>>>>>>>>>>> Hi Timo, >>>>>>>>>>>> >>>>>>>>>>>> Great thanks for your feedback. I would like to share my thoughts >>>>>> with >>>>>>>>>>> you >>>>>>>>>>>> inline. :) >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Jincheng >>>>>>>>>>>> >>>>>>>>>>>> Timo Walther <twal...@apache.org> 于2019年9月2日周一 下午5:04写道: >>>>>>>>>>>> >>>>>>>>>>>>> Hi all, >>>>>>>>>>>>> >>>>>>>>>>>>> the FLIP looks awesome. However, I would like to discuss the >>>>>> changes >>>>>>>> to >>>>>>>>>>>>> the user-facing parts again. Some feedback: >>>>>>>>>>>>> >>>>>>>>>>>>> 1. DataViews: With the current non-annotation design for >>>> DataViews, >>>>>>>> we >>>>>>>>>>>>> cannot perform eager state declaration, right? At which point >>>>>> during >>>>>>>>>>>>> execution do we know which state is required by the function? We >>>>>>>> need to >>>>>>>>>>>>> instantiate the function first, right? >>>>>>>>>>>>> >>>>>>>>>>>>>> We will analysis the Python AggregateFunction and extract the >>>>>>>> DataViews >>>>>>>>>>>> used in the Python AggregateFunction. This can be done >>>>>>>>>>>> by instantiate a Python AggregateFunction, creating an accumulator >>>>>> by >>>>>>>>>>>> calling method create_accumulator and then analysis the created >>>>>>>>>>>> accumulator. This is actually similar to the way that Java >>>>>>>>>>>> AggregateFunction processing codegen logic. The extracted >>>> DataViews >>>>>>>> can >>>>>>>>>>>> then be used to construct the StateDescriptors in the operator, >>>>>> i.e., >>>>>>>> we >>>>>>>>>>>> should have hold the state spec and the state descriptor id in >>>> Java >>>>>>>>>>>> operator and Python worker can access the state by specifying the >>>>>>>>>>>> corresponding state descriptor id. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> 2. Serializability of functions: How do we ensure serializability >>>>>> of >>>>>>>>>>>>> functions for catalog persistence? In the Scala/Java API, we >>>> would >>>>>>>> like >>>>>>>>>>>>> to register classes instead of instances soon. This is the only >>>> way >>>>>>>> to >>>>>>>>>>>>> store a function properly in a catalog or we need some >>>>>>>>>>>>> serialization/deserialization logic in the function interfaces to >>>>>>>>>>>>> convert an instance to string properties. >>>>>>>>>>>>> >>>>>>>>>>>>>> The Python function will be serialized with CloudPickle anyway >>>> in >>>>>>>> the >>>>>>>>>>>> Python API as we need to transfer it to the Python worker which >>>> can >>>>>>>> then >>>>>>>>>>>> deserialize it for execution. The serialized Python function can >>>> be >>>>>>>>>>> stored >>>>>>>>>>>> into catalog. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> 3. TableEnvironment: What is the signature of >>>>>>>> `register_function(self, >>>>>>>>>>>>> name, function)`? Does it accept both a class and function? Like >>>>>>>> `class >>>>>>>>>>>>> Sum` and `def split()`? Could you add some examples for >>>> registering >>>>>>>> both >>>>>>>>>>>>> kinds of functions? >>>>>>>>>>>>> >>>>>>>>>>>>>> It has been already supported which you mentioned. You can find >>>> an >>>>>>>>>>>> example in the POC code: >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>> >>>> https://github.com/dianfu/flink/commit/93f41ba173482226af7513fdec5acba72b274489#diff-34f619b31a7e38604e22a42a441fbe2fR26 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> 4. FunctionDefinition: Function definition is not a user-defined >>>>>>>>>>>>> function definition. It is the highest interface for both >>>>>>>> user-defined >>>>>>>>>>>>> and built-in functions. I'm not sure if getLanguage() should be >>>>>> part >>>>>>>> of >>>>>>>>>>>>> this interface or one-level down which would be >>>>>>>> `UserDefinedFunction`. >>>>>>>>>>>>> Built-in functions will never be implemented in a different >>>>>>>> language. In >>>>>>>>>>>>> any case, I would vote for removing the UNKNOWN language, because >>>>>> it >>>>>>>>>>>>> does not solve anything. Why should a user declare a function >>>> that >>>>>>>> the >>>>>>>>>>>>> runtime can not handle? I also find the term `JAVA` confusing for >>>>>>>> Scala >>>>>>>>>>>>> users. How about `FunctionLanguage.JVM` instead? >>>>>>>>>>>>> >>>>>>>>>>>>>> Actually we may have built-in Python functions in the future. >>>>>>>> Regarding >>>>>>>>>>>> to the following expression: py_udf1(a, b) + py_udf2(c), if there >>>> is >>>>>>>>>>>> built-in Python >>>>>>>>>>>> funciton for '+' operator, then we don't need to mix using Java >>>> and >>>>>>>>>>> Python >>>>>>>>>>>> UDFs. In this way, we can improve the execution performance. >>>>>>>>>>>> Regarding to removing FunctionLanguage.UNKNOWN and renaming >>>>>>>>>>>> FunctionLanguage.Java to FunctionLanguage.JVM, it makes more sense >>>>>> to >>>>>>>> me. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> 5. Function characteristics: In the current design, function >>>>>> classes >>>>>>>> do >>>>>>>>>>>>> not extend from any upper class. How can users declare >>>>>>>> characteristics >>>>>>>>>>>>> that are present in `FunctionDefinition` like determinism, >>>>>>>> requirements, >>>>>>>>>>>>> or soon also monotonism. >>>>>>>>>>>>> >>>>>>>>>>>>>> Actually we have defined 'UserDefinedFunction' which is the base >>>>>>>> class >>>>>>>>>>>> for all user-defined functions. >>>>>>>>>>>> We can define the deterministic, requirements, etc in this class. >>>>>>>>>>>> Currently, we have already supported to define the deterministic. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Timo >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 02.09.19 03:38, Shaoxuan Wang wrote: >>>>>>>>>>>>>> Hi Jincheng, Fudian, and Aljoscha, >>>>>>>>>>>>>> I am assuming the proposed python UDX can also be applied to >>>> Flink >>>>>>>> SQL. >>>>>>>>>>>>>> Is this correct? If yes, I would suggest to title the FLIP as >>>>>> "Flink >>>>>>>>>>>>> Python >>>>>>>>>>>>>> User-Defined Function" or "Flink Python User-Defined Function >>>> for >>>>>>>>>>> Table". >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Shaoxuan >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Aug 28, 2019 at 12:22 PM jincheng sun < >>>>>>>>>>> sunjincheng...@gmail.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for the feedback Bowen! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Great thanks for create the FLIP and bring up the VOTE Dian! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, Jincheng >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年8月28日周三 上午11:32写道: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have started a voting thread [1]. Thanks a lot for your help >>>>>>>> during >>>>>>>>>>>>>>>> creating the FLIP @Jincheng. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Bowen, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Very appreciated for your comments. I have replied you in the >>>>>>>> design >>>>>>>>>>>>> doc. >>>>>>>>>>>>>>>> As it seems that the comments doesn't affect the overall >>>> design, >>>>>>>> I'll >>>>>>>>>>>>> not >>>>>>>>>>>>>>>> cancel the vote for now and we can continue the discussion in >>>>>> the >>>>>>>>>>>>> design >>>>>>>>>>>>>>>> doc. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>> >>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-58-Flink-Python-User-Defined-Function-for-Table-API-td32295.html >>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>> >>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-58-Flink-Python-User-Defined-Function-for-Table-API-td32295.html >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> Dian >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 在 2019年8月28日,上午11:05,Bowen Li <bowenl...@gmail.com> 写道: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Jincheng and Dian, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Sorry for being late to the party. I took a glance at the >>>>>>>> proposal, >>>>>>>>>>>>>>> LGTM >>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>> general, and I left only a couple comments. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Bowen >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Mon, Aug 26, 2019 at 8:05 PM Dian Fu < >>>> dian0511...@gmail.com >>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> Hi Jincheng, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks! It works. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Dian >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 在 2019年8月27日,上午10:55,jincheng sun < >>>> sunjincheng...@gmail.com> >>>>>>>> 写道: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi Dian, can you check if you have edit access? :) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年8月26日周一 上午10:52写道: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi Jincheng, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Appreciated for the kind tips and offering of help. >>>>>> Definitely >>>>>>>>>>> need >>>>>>>>>>>>>>>> it! >>>>>>>>>>>>>>>>>>>> Could you grant me write permission for confluence? My Id: >>>>>>>> Dian >>>>>>>>>>> Fu >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> Dian >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 在 2019年8月26日,上午9:53,jincheng sun < >>>> sunjincheng...@gmail.com >>>>>>> >>>>>>>> 写道: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks for your feedback Hequn & Dian. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Dian, I am glad to see that you want help to create the >>>>>> FLIP! >>>>>>>>>>>>>>>>>>>>> Everyone will have first time, and I am very willing to >>>>>> help >>>>>>>> you >>>>>>>>>>>>>>>>>> complete >>>>>>>>>>>>>>>>>>>>> your first FLIP creation. Here some tips: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> - First I'll give your account write permission for >>>>>>>> confluence. >>>>>>>>>>>>>>>>>>>>> - Before create the FLIP, please have look at the FLIP >>>>>>>> Template >>>>>>>>>>>>>>> [1], >>>>>>>>>>>>>>>>>>>> (It's >>>>>>>>>>>>>>>>>>>>> better to know more about FLIP by reading [2]) >>>>>>>>>>>>>>>>>>>>> - Create Flink Python UDFs related JIRAs after completing >>>>>> the >>>>>>>>>>> VOTE >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>> FLIP.(I think you also can bring up the VOTE thread, if >>>> you >>>>>>>>>>> want! >>>>>>>>>>>>> ) >>>>>>>>>>>>>>>>>>>>> Any problems you encounter during this period,feel free >>>> to >>>>>>>> tell >>>>>>>>>>> me >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>> can solve them together. :) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>> Jincheng >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template >>>>>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals >>>>>>>>>>>>>>>>>>>>> Hequn Cheng <chenghe...@gmail.com> 于2019年8月23日周五 >>>>>> 上午11:54写道: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> +1 for starting the vote. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks Jincheng a lot for the discussion. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Best, Hequn >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Fri, Aug 23, 2019 at 10:06 AM Dian Fu < >>>>>>>>>>> dian0511...@gmail.com> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> Hi Jincheng, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> +1 to start the FLIP create and VOTE on this feature. >>>> I'm >>>>>>>>>>>>> willing >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>> help >>>>>>>>>>>>>>>>>>>>>>> on the FLIP create if you don't mind. As I haven't >>>>>> created >>>>>>>> a >>>>>>>>>>>>> FLIP >>>>>>>>>>>>>>>>>>>> before, >>>>>>>>>>>>>>>>>>>>>>> it will be great if you could help on this. :) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>>>>> Dian >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 在 2019年8月22日,下午11:41,jincheng sun < >>>>>>>> sunjincheng...@gmail.com> >>>>>>>>>>>>>>> 写道: >>>>>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks a lot for your feedback. If there are no more >>>>>>>>>>>>> suggestions >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>> comments, I think it's better to initiate a vote to >>>>>>>> create a >>>>>>>>>>>>>>> FLIP >>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>> Apache Flink Python UDFs. >>>>>>>>>>>>>>>>>>>>>>>> What do you think? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Best, Jincheng >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> jincheng sun <sunjincheng...@gmail.com> 于2019年8月15日周四 >>>>>>>>>>>>>>> 上午12:54写道: >>>>>>>>>>>>>>>>>>>>>>>>> Hi Thomas, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your confirmation and the very important >>>>>>>> reminder >>>>>>>>>>>>>>>> about >>>>>>>>>>>>>>>>>>>>>>> bundle >>>>>>>>>>>>>>>>>>>>>>>>> processing. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I have had add the description about how to perform >>>>>>>> bundle >>>>>>>>>>>>>>>>>> processing >>>>>>>>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>>>>>>>> the perspective of checkpoint and watermark. Feel >>>> free >>>>>> to >>>>>>>>>>>>> leave >>>>>>>>>>>>>>>>>>>>>>> comments if >>>>>>>>>>>>>>>>>>>>>>>>> there are anything not describe clearly. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>> Jincheng >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年8月14日周三 >>>>>> 上午10:08写道: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi Thomas, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks a lot the suggestions. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Regarding to bundle processing, there is a section >>>>>>>>>>>>>>>> "Checkpoint"[1] >>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>> design doc which talks about how to handle the >>>>>>>> checkpoint. >>>>>>>>>>>>>>>>>>>>>>>>>> However, I think you are right that we should talk >>>>>> more >>>>>>>>>>> about >>>>>>>>>>>>>>>> it, >>>>>>>>>>>>>>>>>>>>>> such >>>>>>>>>>>>>>>>>>>>>>> as >>>>>>>>>>>>>>>>>>>>>>>>>> what's bundle processing, how it affects the >>>>>> checkpoint >>>>>>>> and >>>>>>>>>>>>>>>>>>>>>> watermark, >>>>>>>>>>>>>>>>>>>>>>> how >>>>>>>>>>>>>>>>>>>>>>>>>> to handle the checkpoint and watermark, etc. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>> >>>> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3 >>>>>>>>>>>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>> >>>> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3 >>>>>>>>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>>>>>>>> Dian >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 在 2019年8月14日,上午1:01,Thomas Weise <t...@apache.org> >>>>>> 写道: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jincheng, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for putting this together. The proposal is >>>>>> very >>>>>>>>>>>>>>>> detailed, >>>>>>>>>>>>>>>>>>>>>>>>>> thorough >>>>>>>>>>>>>>>>>>>>>>>>>>> and for me as a Beam Flink runner contributor easy >>>> to >>>>>>>>>>>>>>>> understand >>>>>>>>>>>>>>>>>> :) >>>>>>>>>>>>>>>>>>>>>>>>>>> One thing that you should probably detail more is >>>> the >>>>>>>>>>> bundle >>>>>>>>>>>>>>>>>>>>>>>>>> processing. It >>>>>>>>>>>>>>>>>>>>>>>>>>> is critically important for performance that >>>> multiple >>>>>>>>>>>>>>> elements >>>>>>>>>>>>>>>>>> are >>>>>>>>>>>>>>>>>>>>>>>>>>> processed in a bundle. The default bundle size in >>>> the >>>>>>>>>>> Flink >>>>>>>>>>>>>>>>>> runner >>>>>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>>>>>>> 1s or >>>>>>>>>>>>>>>>>>>>>>>>>>> 1000 elements, whichever comes first. And for >>>>>>>> streaming, >>>>>>>>>>> you >>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>>>>> find >>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>> logic necessary to align the bundle processing with >>>>>>>>>>>>>>> watermarks >>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>> checkpointing here: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>> >>>> https://github.com/apache/beam/blob/release-2.14.0/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java >>>>>>>>>>>>>>>>>>>>>>>>>>> Thomas >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Aug 13, 2019 at 7:05 AM jincheng sun < >>>>>>>>>>>>>>>>>>>>>>> sunjincheng...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> The Python Table API(without Python UDF support) >>>> has >>>>>>>>>>>>> already >>>>>>>>>>>>>>>>>> been >>>>>>>>>>>>>>>>>>>>>>>>>> supported >>>>>>>>>>>>>>>>>>>>>>>>>>>> and will be available in the coming release 1.9. >>>>>>>>>>>>>>>>>>>>>>>>>>>> As Python UDF is very important for Python users, >>>>>> we'd >>>>>>>>>>> like >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>> start >>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about the Python UDF support in the >>>>>> Python >>>>>>>>>>> Table >>>>>>>>>>>>>>>> API. >>>>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha Krettek, Dian Fu and I have discussed >>>>>> offline >>>>>>>>>>> and >>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>>>>>>>>> drafted a >>>>>>>>>>>>>>>>>>>>>>>>>>>> design doc[1]. It includes the following items: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> - The user-defined function interfaces. >>>>>>>>>>>>>>>>>>>>>>>>>>>> - The user-defined function execution >>>> architecture. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> As mentioned by many guys in the previous >>>> discussion >>>>>>>>>>>>>>>> thread[2], >>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>>>>>>>>>>> portability framework was introduced in Apache >>>> Beam >>>>>> in >>>>>>>>>>>>>>> latest >>>>>>>>>>>>>>>>>>>>>>>>>> releases. It >>>>>>>>>>>>>>>>>>>>>>>>>>>> provides well-defined, language-neutral data >>>>>>>> structures >>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>> protocols >>>>>>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>> language-neutral user-defined function execution. >>>>>> This >>>>>>>>>>>>>>> design >>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>>>> based >>>>>>>>>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>>>>>>>>> Beam's portability framework. We will introduce >>>> how >>>>>> to >>>>>>>>>>> make >>>>>>>>>>>>>>>> use >>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>> Beam's >>>>>>>>>>>>>>>>>>>>>>>>>>>> portability framework for user-defined function >>>>>>>>>>> execution: >>>>>>>>>>>>>>>> data >>>>>>>>>>>>>>>>>>>>>>>>>>>> transmission, state access, checkpoint, metrics, >>>>>>>> logging, >>>>>>>>>>>>>>> etc. >>>>>>>>>>>>>>>>>>>>>>>>>>>> Considering that the design relies on Beam's >>>>>>>> portability >>>>>>>>>>>>>>>>>> framework >>>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>> Python user-defined function execution and not all >>>>>> the >>>>>>>>>>>>>>>>>>>> contributors >>>>>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink community are familiar with Beam's >>>> portability >>>>>>>>>>>>>>>> framework, >>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>>>>>>>>>>> done a prototype[3] for proof of concept and also >>>>>>>> ease of >>>>>>>>>>>>>>>>>>>>>>>>>> understanding of >>>>>>>>>>>>>>>>>>>>>>>>>>>> the design. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Welcome any feedback. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>>>>> Jincheng >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>> >>>> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit?usp=sharing >>>>>>>>>>>>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>> >>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-38-Support-python-language-in-flink-TableAPI-td28061.html >>>>>>>>>>>>>>>>>>>>>>>>>>>> [3] >>>> https://github.com/dianfu/flink/commits/udf_poc >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >