Hi Wei, Thanks a lot for drafting the FLIP and kicking off the discussion. Big +1 for this feature. This feature will greatly facilitate PyFlink users to use Python UDF in SQL scenarios.
Best, Xingbo Hequn Cheng <he...@apache.org> 于2020年3月13日周五 下午5:10写道: > Big +1 on this feature! It would be great to extend the usage of Python UDF > in SQL scenarios. > The design doc looks good from my side now. Thank you for the update. > > Best, > Hequn > > On Tue, Mar 10, 2020 at 3:50 PM Wei Zhong <weizhong0...@gmail.com> wrote: > > > Hi Timo, > > > > Thanks for your reply. > > > > If we aim for the option 1, it makes sense for me to include the change > in > > this FLIP as the option 1 does not change any public API. I'll update the > > FLIP page to illustrate this. > > > > Best, > > Wei > > > > > 在 2020年3月9日,17:58,Timo Walther <twal...@apache.org> 写道: > > > > > > Hi Wei, > > > > > > I agree with Dawid that we should defer the instantiation of temporary > > functions to compile time. In the long-term, we would like to integrate > > FunctionCatalog as a component of CatalogManager and unify the handling > of > > catalog objects as much as possible. > > > > > > We should aim for your proposed option 1. For fluent definition of > > functions in Table API, we would still like to offer passing instances > like > > `t.select(call(new ScalarFunction() { ... }))` that would be registered > as > > temporary system functions. > > > > > > Regrds, > > > Timo > > > > > > > > > On 09.03.20 09:24, Wei Zhong wrote: > > >> Hi Dawid, > > >> I think defering the instantiation of temporary functions to compile > > time is quite a good idea but needs further discussion. As it is > orthogonal > > with this FLIP, we could continue the discussion in a new thread later. > > What do you think? > > >> Best, > > >> Wei > > >>> 在 2020年3月5日,21:11,Wei Zhong <weizhong0...@gmail.com> 写道: > > >>> > > >>> Hi Dawid, > > >>> > > >>> Thanks for your suggestion. > > >>> > > >>> After some investigation, there are two designs in my mind about how > > to defer the instantiation of temporary system function and temporary > > catalog function to compile time. > > >>> > > >>> 1. FunctionCatalog accepts both FunctionDefinitions and > uninstantiated > > temporary functions. The uninstantiated temporary functions will be > > instantiated when compiling. There is no public API change in this > design, > > but the FunctionCatalog needs to store and process both > FunctionDefinitions > > and uninstantiated temporary functions. > > >>> > > >>> 2. FunctionCatalog accepts only uninstantiated temporary functions. > In > > this design we need to remove those APIs that accepts FunctionDefinitions > > from TableEnvironment, i.e. `void createTemporaryFunction(String path, > > UserDefinedFunction functionInstance)` and `void > > createTemporarySystemFunction(String name, UserDefinedFunction > > functionInstance)`. But the FunctionCatalog only needs to store and > process > > uninstantiated temporary functions. > > >>> > > >>> As I don't know the details about the plan to store temporary > > functions as catalog functions instead of FunctionDefinitions, I'm not > sure > > which solution fits more. It would be great if you could share more > details > > or share some thoughts on these two solutions? > > >>> > > >>> Best, > > >>> Wei > > >>> > > >>>> 在 2020年3月4日,16:17,Dawid Wysakowicz <dwysakow...@apache.org> 写道: > > >>>> > > >>>> Hi all, > > >>>> I had a really quick look and from my perspective the proposal looks > > fine. > > >>>> I share Jarks opinion that the instantiation could be done at a > later > > >>>> stage. I agree with Wei it requires some changes in the internal > > >>>> implementation of the FunctionCatalog, to store temporary functions > as > > >>>> catalog functions instead of FunctionDefinitions, but we have that > on > > our > > >>>> agenda anyway. I would suggest investigating if we could do that as > > part of > > >>>> this flip already. Nevertheless this in theory can be also done > later. > > >>>> > > >>>> Best, > > >>>> Dawid > > >>>> > > >>>> On Mon, 2 Mar 2020, 14:58 Jark Wu, <imj...@gmail.com> wrote: > > >>>> > > >>>>> Thanks for the explanation, Wei! > > >>>>> > > >>>>> On Mon, 2 Mar 2020 at 20:59, Wei Zhong <weizhong0...@gmail.com> > > wrote: > > >>>>> > > >>>>>> Hi Jark, > > >>>>>> > > >>>>>> Thanks for your suggestion. > > >>>>>> > > >>>>>> Actually, the timing of starting a Python process depends on the > UDF > > >>>>> type, > > >>>>>> because the Python process is used to provide the necessary > > information > > >>>>> to > > >>>>>> instantiate the FunctionDefinition object of the Python UDF. For > > catalog > > >>>>>> function, the FunctionDefinition will be instantiated when > > compiling the > > >>>>>> job, which means the Python process is required during the > > compilation > > >>>>>> instead of the registeration. For temporary system function and > > temporary > > >>>>>> catalog function, the FunctionDefinition will be instantiated > > during the > > >>>>>> UDF registeration, so the Python process need to be started at > that > > time. > > >>>>>> > > >>>>>> But this FLIP will only support registering the temporary system > > function > > >>>>>> and temporary catalog function in SQL DDL because registering > > Python UDF > > >>>>> to > > >>>>>> catalog is not supported yet. We plan to support the registeration > > of > > >>>>>> Python catalog function (via Table API and SQL DDL) in a separate > > FLIP. > > >>>>>> I'll add a non-goal section to the FLIP page to illustrate this. > > >>>>>> > > >>>>>> Best, > > >>>>>> Wei > > >>>>>> > > >>>>>> > > >>>>>>> 在 2020年3月2日,15:11,Jark Wu <imj...@gmail.com> 写道: > > >>>>>>> > > >>>>>>> Hi Weizhong, > > >>>>>>> > > >>>>>>> Thanks for proposing this feature. In geneal, I'm +1 from the > > table's > > >>>>>> view. > > >>>>>>> > > >>>>>>> I have one suggestion: I think the register python function into > > >>>>> catalog > > >>>>>>> doesn't need to startup python process (the "High Level Sequence > > >>>>> Diagram" > > >>>>>>> in your FLIP). > > >>>>>>> Because only meta-information is persisted into catalog, we don't > > need > > >>>>> to > > >>>>>>> store "return type", "input types" into catalog. > > >>>>>>> I guess the python process is required when compiling a SQL job. > > >>>>>>> > > >>>>>>> Best, > > >>>>>>> Jark > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> On Fri, 28 Feb 2020 at 19:04, Benchao Li <libenc...@gmail.com> > > wrote: > > >>>>>>> > > >>>>>>>> Big +1 for this feature. > > >>>>>>>> > > >>>>>>>> We built our SQL platform on Java Table API, and most common UDF > > are > > >>>>>>>> implemented in Java. However some python developers are not > > familiar > > >>>>>> with > > >>>>>>>> Java/Scala, and it's very inconvenient for these users to use > UDF > > in > > >>>>>> SQL. > > >>>>>>>> > > >>>>>>>> Wei Zhong <weizhong0...@gmail.com> 于2020年2月28日周五 下午6:58写道: > > >>>>>>>> > > >>>>>>>>> Thank for your reply Dan! > > >>>>>>>>> > > >>>>>>>>> By the way, this FLIP is closely related to the SQL API. @Jark > > Wu < > > >>>>>>>>> imj...@gmail.com> @Timo <twal...@apache.org> could you please > > take a > > >>>>>>>>> look? > > >>>>>>>>> > > >>>>>>>>> Thanks, > > >>>>>>>>> Wei > > >>>>>>>>> > > >>>>>>>>>> 在 2020年2月25日,16:25,zoudan <zoud...@163.com> 写道: > > >>>>>>>>>> > > >>>>>>>>>> +1 for supporting Python UDF in Java/Scala Table API. > > >>>>>>>>>> This is a great feature and would be helpful for python users! > > >>>>>>>>>> > > >>>>>>>>>> Best, > > >>>>>>>>>> Dan Zou > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> -- > > >>>>>>>> > > >>>>>>>> Benchao Li > > >>>>>>>> School of Electronics Engineering and Computer Science, Peking > > >>>>>> University > > >>>>>>>> Tel:+86-15650713730 > > >>>>>>>> Email: libenc...@gmail.com; libenc...@pku.edu.cn > > >>>>>>>> > > >>>>>>>> > > >>>>>> > > >>>>>> > > >>>>> > > >>> > > > > > > > >