Re: 回复：Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

刘大龙 Wed, 23 Mar 2022 08:43:26 -0700

> -----原始邮件-----
> 发件人: "罗宇侠(莫辞)" <luoyuxia.luoyu...@alibaba-inc.com.INVALID>
> 发送时间: 2022-03-23 10:02:26 (星期三)
> 收件人: "Mang Zhang" <zhangma...@163.com>, "Flink Dev" <dev@flink.apache.org>
> 抄送: 
> 主题: 回复：Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> 
> Hi Ron, Thanks for starting this dicuss, some Spark/Hive users will benefit 
> from it. The flip looks good to me. I just have two minor questions:
> 1. For synax explanation, I see it's "Create .... function as 
> identifier....", I think the word "identifier" may not be self-dedescriptive 
> for actually it's not a random name but the name of the class that provides 
> the implementation for function to be create.
> May be it'll be more clear to use "class_name" replace "identifier" just like 
> what Hive[1]/Spark[2] do.
> 
> 2.  >> If the resource used is a remote resource, it will first download the 
> resource to a local temporary directory, which will be generated using UUID, 
> and then register the local path to the user class loader.
> For the above explanation in this FLIP, It seems for such statement sets,
> ""
> Create  function as org.apache.udf1 using jar 'hdfs://myudfs.jar';
> Create  function as org.apache.udf2 using jar 'hdfs://myudfs.jar';
> ""
>  it'll download the resource 'hdfs://myudfs.jar' for twice. So is it possible 
> to provide some cache mechanism that we won't need to download / store for 
> twice?
>  
> 
> Best regards,
> Yuxia
> [1] https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
> [2] 
> https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html------------------------------------------------------------------
> 发件人：Mang Zhang<zhangma...@163.com>
> 日　期：2022年03月22日 11:35:24
> 收件人：<dev@flink.apache.org>
> 主　题：Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> 
> Hi Ron, Thank you so much for this suggestion, this is so good.
> In our company, when users use custom UDF, it is very inconvenient, and the 
> code needs to be packaged into the job jar, 
> and cannot refer to the existing udf jar through the existing udf jar.
> Or pass in the jar reference in the startup command.
> If we implement this feature, users can focus on their own business 
> development.
> I can also contribute if needed.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> --
> 
> Best regards,
> Mang Zhang
> 
> 
> 
> 
> 
> At 2022-03-21 14:57:32, "刘大龙" <ld...@zju.edu.cn> wrote:
> >Hi, everyone
> >
> >
> >
> >
> >I would like to open a discussion for support advanced Function DDL, this 
> >proposal is a continuation of FLIP-79 in which Flink Function DDL is 
> >defined. Until now it is partially released as the Flink function DDL with 
> >user defined resources is not clearly discussed and implemented. It is an 
> >important feature for support to register UDF with custom jar resource, 
> >users can use UDF more more easily without having to put jars under the 
> >classpath in advance.
> >
> >Looking forward to your feedback.
> >
> >
> >
> >
> >[1] 
> >https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> >
> >
> >
> >
> >Best,
> >
> >Ron
> >
> >
>

Hi, Yuxia, thanks for your feedback. It is very good for your advice.

1. I think you are right, "identifier" must be the class name which provides 
the implementation for function. We should use "class_name" replace 
"identifier".

2. Yes, we should cache the resource in local when their url are the same. This 
will be considered in code implementation.
Re: 回复：Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

Reply via email to