Hi Ron, The FLIP mentions that the priority will be set to support HDFS as a resource provider. I'm concerned that we end up with a partially implemented FLIP which only supports local and HDFS and then we move on to other features, as we see happen with others. I would argue that we should not focus on one resource provider, but that at least S3 support is included in the same Flink release as HDFS support is.
Best regards, Martijn Visser https://twitter.com/MartijnVisser82 https://github.com/MartijnVisser On Thu, 14 Apr 2022 at 08:50, 刘大龙 <ld...@zju.edu.cn> wrote: > Hi, everyone > > First of all, thanks for the valuable suggestions received about this > FLIP. After some discussion, it looks like all concerns have been addressed > for now, so I will start a vote about this FLIP in two or three days later. > Also, further feedback is very welcome. > > Best, > > Ron > > > > -----原始邮件----- > > 发件人: "刘大龙" <ld...@zju.edu.cn> > > 发送时间: 2022-04-08 10:09:46 (星期五) > > 收件人: dev@flink.apache.org > > 抄送: > > 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function > DDL > > > > Hi, Martijn > > > > Do you have any question about this FLIP? looking forward to your more > feedback. > > > > Best, > > > > Ron > > > > > > > -----原始邮件----- > > > 发件人: "刘大龙" <ld...@zju.edu.cn> > > > 发送时间: 2022-03-29 19:33:58 (星期二) > > > 收件人: dev@flink.apache.org > > > 抄送: > > > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > > > > > > > > > > > > > -----原始邮件----- > > > > 发件人: "Martijn Visser" <mart...@ververica.com> > > > > 发送时间: 2022-03-24 16:18:14 (星期四) > > > > 收件人: dev <dev@flink.apache.org> > > > > 抄送: > > > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > > > > > Hi Ron, > > > > > > > > Thanks for creating the FLIP. You're talking about both local and > remote > > > > resources. With regards to remote resources, how do you see this > work with > > > > Flink's filesystem abstraction? I did read in the FLIP that Hadoop > > > > dependencies are not packaged, but I would hope that we do that for > all > > > > filesystem implementation. I don't think it's a good idea to have > any tight > > > > coupling to file system implementations, especially if at some point > we > > > > could also externalize file system implementations (like we're doing > for > > > > connectors already). I think the FLIP would be better by not only > > > > referring to "Hadoop" as a remote resource provider, but a more > generic > > > > term since there are more options than Hadoop. > > > > > > > > I'm also thinking about security/operations implications: would it be > > > > possible for bad actor X to create a JAR that either influences other > > > > running jobs, leaks data or credentials or anything else? If so, I > think it > > > > would also be good to have an option to disable this feature > completely. I > > > > think there are roughly two types of companies who run Flink: those > who > > > > open it up for everyone to use (here the feature would be welcomed) > and > > > > those who need to follow certain minimum standards/have a more > closed Flink > > > > ecosystem). They usually want to validate a JAR upfront before > making it > > > > available, even at the expense of speed, because it gives them more > control > > > > over what will be running in their environment. > > > > > > > > Best regards, > > > > > > > > Martijn Visser > > > > https://twitter.com/MartijnVisser82 > > > > > > > > > > > > On Wed, 23 Mar 2022 at 16:47, 刘大龙 <ld...@zju.edu.cn> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > -----原始邮件----- > > > > > > 发件人: "Peter Huang" <huangzhenqiu0...@gmail.com> > > > > > > 发送时间: 2022-03-23 11:13:32 (星期三) > > > > > > 收件人: dev <dev@flink.apache.org> > > > > > > 抄送: > > > > > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > > > > > > > > > Hi Ron, > > > > > > > > > > > > Thanks for reviving the discussion of the work. The design looks > good. A > > > > > > small typo in the FLIP is that currently it is marked as > released in > > > > > 1.16. > > > > > > > > > > > > > > > > > > Best Regards > > > > > > Peter Huang > > > > > > > > > > > > > > > > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang <zhangma...@163.com> > wrote: > > > > > > > > > > > > > hi Yuxia, > > > > > > > > > > > > > > > > > > > > > Thanks for your reply. Your reminder is very important ! > > > > > > > > > > > > > > > > > > > > > Since we download the file to the local, remember to clean it > up when > > > > > the > > > > > > > flink client exits > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > Best regards, > > > > > > > Mang Zhang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)" > > > > > > > <luoyuxia.luoyu...@alibaba-inc.com.INVALID> wrote: > > > > > > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive > users will > > > > > > > benefit from it. The flip looks good to me. I just have two > minor > > > > > questions: > > > > > > > >1. For synax explanation, I see it's "Create .... function as > > > > > > > identifier....", I think the word "identifier" may not be > > > > > > > self-dedescriptive for actually it's not a random name but the > name of > > > > > the > > > > > > > class that provides the implementation for function to be > create. > > > > > > > >May be it'll be more clear to use "class_name" replace > "identifier" > > > > > just > > > > > > > like what Hive[1]/Spark[2] do. > > > > > > > > > > > > > > > >2. >> If the resource used is a remote resource, it will > first > > > > > download > > > > > > > the resource to a local temporary directory, which will be > generated > > > > > using > > > > > > > UUID, and then register the local path to the user class > loader. > > > > > > > >For the above explanation in this FLIP, It seems for such > statement > > > > > sets, > > > > > > > >"" > > > > > > > >Create function as org.apache.udf1 using jar > 'hdfs://myudfs.jar'; > > > > > > > >Create function as org.apache.udf2 using jar > 'hdfs://myudfs.jar'; > > > > > > > >"" > > > > > > > > it'll download the resource 'hdfs://myudfs.jar' for twice. > So is it > > > > > > > possible to provide some cache mechanism that we won't need to > > > > > download / > > > > > > > store for twice? > > > > > > > > > > > > > > > > > > > > > > > >Best regards, > > > > > > > >Yuxia > > > > > > > >[1] > > > > > > https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl > > > > > > > >[2] > > > > > > > > > > > > > https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html------------------------------------------------------------------ > > > > > > > >发件人:Mang Zhang<zhangma...@163.com> > > > > > > > >日 期:2022年03月22日 11:35:24 > > > > > > > >收件人:<dev@flink.apache.org> > > > > > > > >主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > > > > > > > > > > > > >Hi Ron, Thank you so much for this suggestion, this is so > good. > > > > > > > >In our company, when users use custom UDF, it is very > inconvenient, > > > > > and > > > > > > > the code needs to be packaged into the job jar, > > > > > > > >and cannot refer to the existing udf jar through the existing > udf jar. > > > > > > > >Or pass in the jar reference in the startup command. > > > > > > > >If we implement this feature, users can focus on their own > business > > > > > > > development. > > > > > > > >I can also contribute if needed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >-- > > > > > > > > > > > > > > > >Best regards, > > > > > > > >Mang Zhang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >At 2022-03-21 14:57:32, "刘大龙" <ld...@zju.edu.cn> wrote: > > > > > > > >>Hi, everyone > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >>I would like to open a discussion for support advanced > Function DDL, > > > > > > > this proposal is a continuation of FLIP-79 in which Flink > Function DDL > > > > > is > > > > > > > defined. Until now it is partially released as the Flink > function DDL > > > > > with > > > > > > > user defined resources is not clearly discussed and > implemented. It is > > > > > an > > > > > > > important feature for support to register UDF with custom jar > resource, > > > > > > > users can use UDF more more easily without having to put jars > under the > > > > > > > classpath in advance. > > > > > > > >> > > > > > > > >>Looking forward to your feedback. > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >>[1] > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >>Best, > > > > > > > >> > > > > > > > >>Ron > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > Hi, Peter, Thanks for your feedback. This work also has your > effort, thank > > > > > you very much. > > > > > > > > > > > Hi, Martijn > > > Thank you very much for the feedback, it was very useful for me. > > > 1. Filesystem abstraction: With regards to remote resources, I agree > with you that we should use Flink's FileSytem abstraction to supports all > types of file system, including HTTP, S3, HDFS, etc, rather than binding to > a specific implementation. Currently in the first version, we will give > priority to support HDFS as a resource provider by Flink's FileSytem > abstraction. HDFS is used very much. > > > > > > 2. Security/operations implications: The point you are considering is > a great one, security is an issue that needs to be considered. Your > starting point is that Jar needs to have some verification done on it > before it is used, to avoid some non-secure behavior. However, IMO, the > validation of Jar is supposed to be done by the platform side itself, and > the platform needs to ensure that users have permission to use the jar and > security of Jar. Option is not able to disable the syntax completely, the > user can still open it by Set command. I think the most correct approach is > the platform to verify rather than the engine side. In addition, the > current Connector/UDF/DataStream program also exists using custom Jar case, > these Jar will also have security issues, Flink currently does not provide > Option to prohibit the use of custom Jar. The user used a custom Jar, which > means that the user has permission to do this, then the user should be > responsible for the security of the Jar. If it was hacked, it means that > there are loopholes in the company's permissions/network and they need to > fix these problems. All in all, I agree with you on this point, but Option > can't solve this problem. > > > > > > Best, > > > > > > Ron >