Hi Ron, Thanks for creating the FLIP. You're talking about both local and remote resources. With regards to remote resources, how do you see this work with Flink's filesystem abstraction? I did read in the FLIP that Hadoop dependencies are not packaged, but I would hope that we do that for all filesystem implementation. I don't think it's a good idea to have any tight coupling to file system implementations, especially if at some point we could also externalize file system implementations (like we're doing for connectors already). I think the FLIP would be better by not only referring to "Hadoop" as a remote resource provider, but a more generic term since there are more options than Hadoop.
I'm also thinking about security/operations implications: would it be possible for bad actor X to create a JAR that either influences other running jobs, leaks data or credentials or anything else? If so, I think it would also be good to have an option to disable this feature completely. I think there are roughly two types of companies who run Flink: those who open it up for everyone to use (here the feature would be welcomed) and those who need to follow certain minimum standards/have a more closed Flink ecosystem). They usually want to validate a JAR upfront before making it available, even at the expense of speed, because it gives them more control over what will be running in their environment. Best regards, Martijn Visser https://twitter.com/MartijnVisser82 On Wed, 23 Mar 2022 at 16:47, 刘大龙 <ld...@zju.edu.cn> wrote: > > > > > -----原始邮件----- > > 发件人: "Peter Huang" <huangzhenqiu0...@gmail.com> > > 发送时间: 2022-03-23 11:13:32 (星期三) > > 收件人: dev <dev@flink.apache.org> > > 抄送: > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > Hi Ron, > > > > Thanks for reviving the discussion of the work. The design looks good. A > > small typo in the FLIP is that currently it is marked as released in > 1.16. > > > > > > Best Regards > > Peter Huang > > > > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang <zhangma...@163.com> wrote: > > > > > hi Yuxia, > > > > > > > > > Thanks for your reply. Your reminder is very important ! > > > > > > > > > Since we download the file to the local, remember to clean it up when > the > > > flink client exits > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Best regards, > > > Mang Zhang > > > > > > > > > > > > > > > > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)" > > > <luoyuxia.luoyu...@alibaba-inc.com.INVALID> wrote: > > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive users will > > > benefit from it. The flip looks good to me. I just have two minor > questions: > > > >1. For synax explanation, I see it's "Create .... function as > > > identifier....", I think the word "identifier" may not be > > > self-dedescriptive for actually it's not a random name but the name of > the > > > class that provides the implementation for function to be create. > > > >May be it'll be more clear to use "class_name" replace "identifier" > just > > > like what Hive[1]/Spark[2] do. > > > > > > > >2. >> If the resource used is a remote resource, it will first > download > > > the resource to a local temporary directory, which will be generated > using > > > UUID, and then register the local path to the user class loader. > > > >For the above explanation in this FLIP, It seems for such statement > sets, > > > >"" > > > >Create function as org.apache.udf1 using jar 'hdfs://myudfs.jar'; > > > >Create function as org.apache.udf2 using jar 'hdfs://myudfs.jar'; > > > >"" > > > > it'll download the resource 'hdfs://myudfs.jar' for twice. So is it > > > possible to provide some cache mechanism that we won't need to > download / > > > store for twice? > > > > > > > > > > > >Best regards, > > > >Yuxia > > > >[1] > https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl > > > >[2] > > > > https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html------------------------------------------------------------------ > > > >发件人:Mang Zhang<zhangma...@163.com> > > > >日 期:2022年03月22日 11:35:24 > > > >收件人:<dev@flink.apache.org> > > > >主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > > > > >Hi Ron, Thank you so much for this suggestion, this is so good. > > > >In our company, when users use custom UDF, it is very inconvenient, > and > > > the code needs to be packaged into the job jar, > > > >and cannot refer to the existing udf jar through the existing udf jar. > > > >Or pass in the jar reference in the startup command. > > > >If we implement this feature, users can focus on their own business > > > development. > > > >I can also contribute if needed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >-- > > > > > > > >Best regards, > > > >Mang Zhang > > > > > > > > > > > > > > > > > > > > > > > >At 2022-03-21 14:57:32, "刘大龙" <ld...@zju.edu.cn> wrote: > > > >>Hi, everyone > > > >> > > > >> > > > >> > > > >> > > > >>I would like to open a discussion for support advanced Function DDL, > > > this proposal is a continuation of FLIP-79 in which Flink Function DDL > is > > > defined. Until now it is partially released as the Flink function DDL > with > > > user defined resources is not clearly discussed and implemented. It is > an > > > important feature for support to register UDF with custom jar resource, > > > users can use UDF more more easily without having to put jars under the > > > classpath in advance. > > > >> > > > >>Looking forward to your feedback. > > > >> > > > >> > > > >> > > > >> > > > >>[1] > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL > > > >> > > > >> > > > >> > > > >> > > > >>Best, > > > >> > > > >>Ron > > > >> > > > >> > > > > > > > > > Hi, Peter, Thanks for your feedback. This work also has your effort, thank > you very much. >