Re: Re: 回复：Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

Martijn Visser Thu, 24 Mar 2022 01:18:42 -0700

Hi Ron,

Thanks for creating the FLIP. You're talking about both local and remote
resources. With regards to remote resources, how do you see this work with
Flink's filesystem abstraction? I did read in the FLIP that Hadoop
dependencies are not packaged, but I would hope that we do that for all
filesystem implementation. I don't think it's a good idea to have any tight
coupling to file system implementations, especially if at some point we
could also externalize file system implementations (like we're doing for
connectors already). I think the FLIP would be better by not only
referring to "Hadoop" as a remote resource provider, but a more generic
term since there are more options than Hadoop.


I'm also thinking about security/operations implications: would it be
possible for bad actor X to create a JAR that either influences other
running jobs, leaks data or credentials or anything else? If so, I think it
would also be good to have an option to disable this feature completely. I
think there are roughly two types of companies who run Flink: those who
open it up for everyone to use (here the feature would be welcomed) and
those who need to follow certain minimum standards/have a more closed Flink
ecosystem). They usually want to validate a JAR upfront before making it
available, even at the expense of speed, because it gives them more control
over what will be running in their environment.

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82


On Wed, 23 Mar 2022 at 16:47, 刘大龙 <[email protected]> wrote:

>
>
>
> > -----原始邮件-----
> > 发件人: "Peter Huang" <[email protected]>
> > 发送时间: 2022-03-23 11:13:32 (星期三)
> > 收件人: dev <[email protected]>
> > 抄送:
> > 主题: Re: 回复：Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> >
> > Hi Ron,
> >
> > Thanks for reviving the discussion of the work. The design looks good. A
> > small typo in the FLIP is that currently it is marked as released in
> 1.16.
> >
> >
> > Best Regards
> > Peter Huang
> >
> >
> > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang <[email protected]> wrote:
> >
> > > hi Yuxia,
> > >
> > >
> > > Thanks for your reply. Your reminder is very important !
> > >
> > >
> > > Since we download the file to the local, remember to clean it up when
> the
> > > flink client exits
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Mang Zhang
> > >
> > >
> > >
> > >
> > >
> > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)"
> > > <[email protected]> wrote:
> > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive users will
> > > benefit from it. The flip looks good to me. I just have two minor
> questions:
> > > >1. For synax explanation, I see it's "Create .... function as
> > > identifier....", I think the word "identifier" may not be
> > > self-dedescriptive for actually it's not a random name but the name of
> the
> > > class that provides the implementation for function to be create.
> > > >May be it'll be more clear to use "class_name" replace "identifier"
> just
> > > like what Hive[1]/Spark[2] do.
> > > >
> > > >2.  >> If the resource used is a remote resource, it will first
> download
> > > the resource to a local temporary directory, which will be generated
> using
> > > UUID, and then register the local path to the user class loader.
> > > >For the above explanation in this FLIP, It seems for such statement
> sets,
> > > >""
> > > >Create  function as org.apache.udf1 using jar 'hdfs://myudfs.jar';
> > > >Create  function as org.apache.udf2 using jar 'hdfs://myudfs.jar';
> > > >""
> > > > it'll download the resource 'hdfs://myudfs.jar' for twice. So is it
> > > possible to provide some cache mechanism that we won't need to
> download /
> > > store for twice?
> > > >
> > > >
> > > >Best regards,
> > > >Yuxia
> > > >[1]
> https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
> > > >[2]
> > >
> https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html------------------------------------------------------------------
> > > >发件人：Mang Zhang<[email protected]>
> > > >日 期：2022年03月22日 11:35:24
> > > >收件人：<[email protected]>
> > > >主 题：Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > >
> > > >Hi Ron, Thank you so much for this suggestion, this is so good.
> > > >In our company, when users use custom UDF, it is very inconvenient,
> and
> > > the code needs to be packaged into the job jar,
> > > >and cannot refer to the existing udf jar through the existing udf jar.
> > > >Or pass in the jar reference in the startup command.
> > > >If we implement this feature, users can focus on their own business
> > > development.
> > > >I can also contribute if needed.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >--
> > > >
> > > >Best regards,
> > > >Mang Zhang
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >At 2022-03-21 14:57:32, "刘大龙" <[email protected]> wrote:
> > > >>Hi, everyone
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>I would like to open a discussion for support advanced Function DDL,
> > > this proposal is a continuation of FLIP-79 in which Flink Function DDL
> is
> > > defined. Until now it is partially released as the Flink function DDL
> with
> > > user defined resources is not clearly discussed and implemented. It is
> an
> > > important feature for support to register UDF with custom jar resource,
> > > users can use UDF more more easily without having to put jars under the
> > > classpath in advance.
> > > >>
> > > >>Looking forward to your feedback.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>[1]
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>Best,
> > > >>
> > > >>Ron
> > > >>
> > > >>
> > > >
> > >
>
> Hi, Peter, Thanks for your feedback. This work also has your effort, thank
> you very much.
>

Re: Re: 回复：Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

Reply via email to