I agree with Martijn. At least, HDFS S3 OSS should be supported.
Best, Jingsong On Thu, Apr 14, 2022 at 4:46 PM Martijn Visser <mart...@ververica.com> wrote: > > Hi Ron, > > The FLIP mentions that the priority will be set to support HDFS as a > resource provider. I'm concerned that we end up with a partially > implemented FLIP which only supports local and HDFS and then we move on to > other features, as we see happen with others. I would argue that we should > not focus on one resource provider, but that at least S3 support is > included in the same Flink release as HDFS support is. > > Best regards, > > Martijn Visser > https://twitter.com/MartijnVisser82 > https://github.com/MartijnVisser > > > On Thu, 14 Apr 2022 at 08:50, 刘大龙 <ld...@zju.edu.cn> wrote: > > > Hi, everyone > > > > First of all, thanks for the valuable suggestions received about this > > FLIP. After some discussion, it looks like all concerns have been addressed > > for now, so I will start a vote about this FLIP in two or three days later. > > Also, further feedback is very welcome. > > > > Best, > > > > Ron > > > > > > > -----原始邮件----- > > > 发件人: "刘大龙" <ld...@zju.edu.cn> > > > 发送时间: 2022-04-08 10:09:46 (星期五) > > > 收件人: dev@flink.apache.org > > > 抄送: > > > 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function > > DDL > > > > > > Hi, Martijn > > > > > > Do you have any question about this FLIP? looking forward to your more > > feedback. > > > > > > Best, > > > > > > Ron > > > > > > > > > > -----原始邮件----- > > > > 发件人: "刘大龙" <ld...@zju.edu.cn> > > > > 发送时间: 2022-03-29 19:33:58 (星期二) > > > > 收件人: dev@flink.apache.org > > > > 抄送: > > > > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > > > > > > > > > > > > > > > > > > -----原始邮件----- > > > > > 发件人: "Martijn Visser" <mart...@ververica.com> > > > > > 发送时间: 2022-03-24 16:18:14 (星期四) > > > > > 收件人: dev <dev@flink.apache.org> > > > > > 抄送: > > > > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > > > > > > > Hi Ron, > > > > > > > > > > Thanks for creating the FLIP. You're talking about both local and > > remote > > > > > resources. With regards to remote resources, how do you see this > > work with > > > > > Flink's filesystem abstraction? I did read in the FLIP that Hadoop > > > > > dependencies are not packaged, but I would hope that we do that for > > all > > > > > filesystem implementation. I don't think it's a good idea to have > > any tight > > > > > coupling to file system implementations, especially if at some point > > we > > > > > could also externalize file system implementations (like we're doing > > for > > > > > connectors already). I think the FLIP would be better by not only > > > > > referring to "Hadoop" as a remote resource provider, but a more > > generic > > > > > term since there are more options than Hadoop. > > > > > > > > > > I'm also thinking about security/operations implications: would it be > > > > > possible for bad actor X to create a JAR that either influences other > > > > > running jobs, leaks data or credentials or anything else? If so, I > > think it > > > > > would also be good to have an option to disable this feature > > completely. I > > > > > think there are roughly two types of companies who run Flink: those > > who > > > > > open it up for everyone to use (here the feature would be welcomed) > > and > > > > > those who need to follow certain minimum standards/have a more > > closed Flink > > > > > ecosystem). They usually want to validate a JAR upfront before > > making it > > > > > available, even at the expense of speed, because it gives them more > > control > > > > > over what will be running in their environment. > > > > > > > > > > Best regards, > > > > > > > > > > Martijn Visser > > > > > https://twitter.com/MartijnVisser82 > > > > > > > > > > > > > > > On Wed, 23 Mar 2022 at 16:47, 刘大龙 <ld...@zju.edu.cn> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -----原始邮件----- > > > > > > > 发件人: "Peter Huang" <huangzhenqiu0...@gmail.com> > > > > > > > 发送时间: 2022-03-23 11:13:32 (星期三) > > > > > > > 收件人: dev <dev@flink.apache.org> > > > > > > > 抄送: > > > > > > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > > > > > > > > > > > Hi Ron, > > > > > > > > > > > > > > Thanks for reviving the discussion of the work. The design looks > > good. A > > > > > > > small typo in the FLIP is that currently it is marked as > > released in > > > > > > 1.16. > > > > > > > > > > > > > > > > > > > > > Best Regards > > > > > > > Peter Huang > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang <zhangma...@163.com> > > wrote: > > > > > > > > > > > > > > > hi Yuxia, > > > > > > > > > > > > > > > > > > > > > > > > Thanks for your reply. Your reminder is very important ! > > > > > > > > > > > > > > > > > > > > > > > > Since we download the file to the local, remember to clean it > > up when > > > > > > the > > > > > > > > flink client exits > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > Best regards, > > > > > > > > Mang Zhang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)" > > > > > > > > <luoyuxia.luoyu...@alibaba-inc.com.INVALID> wrote: > > > > > > > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive > > users will > > > > > > > > benefit from it. The flip looks good to me. I just have two > > minor > > > > > > questions: > > > > > > > > >1. For synax explanation, I see it's "Create .... function as > > > > > > > > identifier....", I think the word "identifier" may not be > > > > > > > > self-dedescriptive for actually it's not a random name but the > > name of > > > > > > the > > > > > > > > class that provides the implementation for function to be > > create. > > > > > > > > >May be it'll be more clear to use "class_name" replace > > "identifier" > > > > > > just > > > > > > > > like what Hive[1]/Spark[2] do. > > > > > > > > > > > > > > > > > >2. >> If the resource used is a remote resource, it will > > first > > > > > > download > > > > > > > > the resource to a local temporary directory, which will be > > generated > > > > > > using > > > > > > > > UUID, and then register the local path to the user class > > loader. > > > > > > > > >For the above explanation in this FLIP, It seems for such > > statement > > > > > > sets, > > > > > > > > >"" > > > > > > > > >Create function as org.apache.udf1 using jar > > 'hdfs://myudfs.jar'; > > > > > > > > >Create function as org.apache.udf2 using jar > > 'hdfs://myudfs.jar'; > > > > > > > > >"" > > > > > > > > > it'll download the resource 'hdfs://myudfs.jar' for twice. > > So is it > > > > > > > > possible to provide some cache mechanism that we won't need to > > > > > > download / > > > > > > > > store for twice? > > > > > > > > > > > > > > > > > > > > > > > > > > >Best regards, > > > > > > > > >Yuxia > > > > > > > > >[1] > > > > > > > > https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl > > > > > > > > >[2] > > > > > > > > > > > > > > > > https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html------------------------------------------------------------------ > > > > > > > > >发件人:Mang Zhang<zhangma...@163.com> > > > > > > > > >日 期:2022年03月22日 11:35:24 > > > > > > > > >收件人:<dev@flink.apache.org> > > > > > > > > >主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > > > > > > > > > > > > > > >Hi Ron, Thank you so much for this suggestion, this is so > > good. > > > > > > > > >In our company, when users use custom UDF, it is very > > inconvenient, > > > > > > and > > > > > > > > the code needs to be packaged into the job jar, > > > > > > > > >and cannot refer to the existing udf jar through the existing > > udf jar. > > > > > > > > >Or pass in the jar reference in the startup command. > > > > > > > > >If we implement this feature, users can focus on their own > > business > > > > > > > > development. > > > > > > > > >I can also contribute if needed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >-- > > > > > > > > > > > > > > > > > >Best regards, > > > > > > > > >Mang Zhang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >At 2022-03-21 14:57:32, "刘大龙" <ld...@zju.edu.cn> wrote: > > > > > > > > >>Hi, everyone > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >>I would like to open a discussion for support advanced > > Function DDL, > > > > > > > > this proposal is a continuation of FLIP-79 in which Flink > > Function DDL > > > > > > is > > > > > > > > defined. Until now it is partially released as the Flink > > function DDL > > > > > > with > > > > > > > > user defined resources is not clearly discussed and > > implemented. It is > > > > > > an > > > > > > > > important feature for support to register UDF with custom jar > > resource, > > > > > > > > users can use UDF more more easily without having to put jars > > under the > > > > > > > > classpath in advance. > > > > > > > > >> > > > > > > > > >>Looking forward to your feedback. > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >>[1] > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >>Best, > > > > > > > > >> > > > > > > > > >>Ron > > > > > > > > >> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, Peter, Thanks for your feedback. This work also has your > > effort, thank > > > > > > you very much. > > > > > > > > > > > > > > Hi, Martijn > > > > Thank you very much for the feedback, it was very useful for me. > > > > 1. Filesystem abstraction: With regards to remote resources, I agree > > with you that we should use Flink's FileSytem abstraction to supports all > > types of file system, including HTTP, S3, HDFS, etc, rather than binding to > > a specific implementation. Currently in the first version, we will give > > priority to support HDFS as a resource provider by Flink's FileSytem > > abstraction. HDFS is used very much. > > > > > > > > 2. Security/operations implications: The point you are considering is > > a great one, security is an issue that needs to be considered. Your > > starting point is that Jar needs to have some verification done on it > > before it is used, to avoid some non-secure behavior. However, IMO, the > > validation of Jar is supposed to be done by the platform side itself, and > > the platform needs to ensure that users have permission to use the jar and > > security of Jar. Option is not able to disable the syntax completely, the > > user can still open it by Set command. I think the most correct approach is > > the platform to verify rather than the engine side. In addition, the > > current Connector/UDF/DataStream program also exists using custom Jar case, > > these Jar will also have security issues, Flink currently does not provide > > Option to prohibit the use of custom Jar. The user used a custom Jar, which > > means that the user has permission to do this, then the user should be > > responsible for the security of the Jar. If it was hacked, it means that > > there are loopholes in the company's permissions/network and they need to > > fix these problems. All in all, I agree with you on this point, but Option > > can't solve this problem. > > > > > > > > Best, > > > > > > > > Ron > >