Hi Terry, Thanks for the quick response. We are on the same page. For the properties of function DDL, let's see whether there is such a need from other people. I will start voting on the design in 24 hours.
Best Regards Peter Huang On Thu, Oct 31, 2019 at 3:18 AM Terry Wang <zjuwa...@gmail.com> wrote: > Hi Peter, > > I’d like to share some thoughts from mysids: > 1. what's the syntax to distinguish function language ? > +1 for using `[LANGUAGE JVM|PYTHON] USING JAR` > 2. How to persist function language in backend catalog ? > + 1 for a separate field in CatalogFunction. But as to specific > backend, we may persist it case by case. Special case includes how > HiveCatalog store the kind of CatalogFucnction. > 3. do we really need to allow users set a properties map for a udf? > There are use case requiring passing external arguments to udf for > sure, but the need can also be met by passing arguments to `eval` when > calling udf in sql. > IMO, there is not much need to support set properties map for a udf. > > 4. Should a catalog implement to be able to decide whether it can take a > properties map, and which language of a udf it can persist? > IMO, it’s necessary for catalog implementation to provide such > information. But for flink 1.10 map goal, we can just skip this part. > > > > Best, > Terry Wang > > > > > 2019年10月30日 13:52,Peter Huang <huangzhenqiu0...@gmail.com> 写道: > > > > Hi Bowen, > > > > I can't agree more about we first have an agreement on the DDL syntax and > > focus on the MVP in the current phase. > > > > 1) what's the syntax to distinguish function language > > Currently, there are two opinions: > > > > - USING 'python .....' > > - [LANGUAGE JVM|PYTHON] USING JAR '...' > > > > As we need to support multiple resources as HQL, we shouldn't repeat the > > language symbol as a suffix of each resource. > > I would prefer option two, but definitely open to more comments. > > > > 2) How to persist function language in backend catalog? as a k-v pair in > > properties map, or a dedicate field? > > Even though language type is also a property, I think a separate field in > > CatalogFunction is a more clean solution. > > > > 3) do we really need to allow users set a properties map for udf? what > needs > > to be stored there? what are they used for? > > > > I am considering a type of use case that use UDFS for realtime inference. > > The model is nested in the udf as a resource. But there are > > multiple parameters are customizable. In this way, user can use > properties > > to define those parameters. > > > > I only have answers to these questions. For questions about the catalog > > implementation, I hope we can collect more feedback from the community. > > > > > > Best Regards > > Peter Huang > > > > > > > > > > > > Best Regards > > Peter Huang > > > > On Tue, Oct 29, 2019 at 11:31 AM Bowen Li <bowenl...@gmail.com> wrote: > > > >> Hi all, > >> > >> Besides all the good questions raised above, we seem all agree to have a > >> MVP for Flink 1.10, "to support users to create and persist a java > >> class-based udf that's already in classpath (no extra resource loading), > >> and use it later in queries". > >> > >> IIUIC, to achieve that in 1.10, the following are currently the core > >> issues/blockers we should figure out, and solve them as our **highest > >> priority**: > >> > >> - what's the syntax to distinguish function language (java, scala, > python, > >> etc)? we only need to implement the java one in 1.10 but have to settle > >> down the long term solution > >> - how to persist function language in backend catalog? as a k-v pair in > >> properties map, or a dedicate field? > >> - do we really need to allow users set a properties map for udf? what > needs > >> to be stored there? what are they used for? > >> - should a catalog impl be able to decide whether it can take a > properties > >> map (if we decide to have one), and which language of a udf it can > persist? > >> - E.g. Hive metastore, which backs Flink's HiveCatalog, cannot take a > >> properties map and is only able to persist java udf [1], unless we do > >> something hacky to it > >> > >> I feel these questions are essential to Flink functions in the long run, > >> but most importantly, are also the minimum scope for Flink 1.10. Aspects > >> like resource loading security or compatibility with Hive syntax are > >> important too, however if we focus on them now, we may not be able to > get > >> the MVP out in time. > >> > >> [1] > >> - > >> > >> > https://hive.apache.org/javadocs/r3.1.2/api/org/apache/hadoop/hive/metastore/api/Function.html > >> - > >> > >> > https://hive.apache.org/javadocs/r3.1.2/api/org/apache/hadoop/hive/metastore/api/FunctionType.html > >> > >> > >> > >> On Sun, Oct 27, 2019 at 8:22 PM Peter Huang <huangzhenqiu0...@gmail.com > > > >> wrote: > >> > >>> Hi Timo, > >>> > >>> Thanks for the feedback. I replied and adjust the design accordingly. > For > >>> the concern of class loading. > >>> I think we need to distinguish the function class loading for Temporary > >> and > >>> Permanent function. > >>> > >>> 1) For Permanent function, we can add it to the job graph so that we > >> don't > >>> need to load it multiple times for the different sessions. > >>> 2) For Temporary function, we can register function with a session key, > >> and > >>> use different class loaders in RuntimeContext implementation. > >>> > >>> I added more description in the doc. Please review it again. > >>> > >>> > >>> Best Regards > >>> Peter Huang > >>> > >>> > >>> > >>> > >>> On Thu, Oct 24, 2019 at 2:14 AM Timo Walther <twal...@apache.org> > wrote: > >>> > >>>> Hi Peter, > >>>> > >>>> thanks for your proposal. I left some comments in the FLIP document. I > >>>> agree with Terry that we can have a MVP in Flink 1.10 but should > >> already > >>>> discuss the bigger picture as a DDL string cannot be changed easily > >> once > >>>> released. > >>>> > >>>> In particular we should discuss how resources for function are loaded. > >>>> If they are simply added to the JobGraph they are available to all > >>>> functions and could potentially interfere with each other, right? > >>>> > >>>> Thanks, > >>>> Timo > >>>> > >>>> > >>>> > >>>> On 24.10.19 05:32, Terry Wang wrote: > >>>>> Hi Peter, > >>>>> > >>>>> Sorry late to reply. Thanks for your efforts on this and I just > >> looked > >>>> through your design. > >>>>> I left some comments in the doc about alter function section and > >>>> function catalog interface. > >>>>> IMO, the overall design is ok and we can discuss further more about > >>> some > >>>> details. > >>>>> I also think it’s necessary to have this awesome feature limit to > >> basic > >>>> function (of course better to have all :) ) in 1.10 release. > >>>>> > >>>>> Best, > >>>>> Terry Wang > >>>>> > >>>>> > >>>>> > >>>>>> 2019年10月16日 14:19,Peter Huang <huangzhenqiu0...@gmail.com> 写道: > >>>>>> > >>>>>> Hi Xuefu, > >>>>>> > >>>>>> Thank you for the feedback. I think you are pointing out a similar > >>>> concern > >>>>>> with Bowen. Let me describe > >>>>>> how the catalog function and function factory will be changed in the > >>>>>> implementation section. > >>>>>> Then, we can have more discussion in detail. > >>>>>> > >>>>>> > >>>>>> Best Regards > >>>>>> Peter Huang > >>>>>> > >>>>>> On Tue, Oct 15, 2019 at 4:18 PM Xuefu Z <usxu...@gmail.com> wrote: > >>>>>> > >>>>>>> Thanks to Peter for the proposal! > >>>>>>> > >>>>>>> I left some comments in the google doc. Besides what Bowen pointed > >>>> out, I'm > >>>>>>> unclear about how things work end to end from the document. For > >>>> instance, > >>>>>>> SQL DDL-like function definition is mentioned. I guess just having > >> a > >>>> DDL > >>>>>>> for it doesn't explain how it's supported functionally. I think > >> it's > >>>> better > >>>>>>> to have some clarification on what is expected work and what's for > >>> the > >>>>>>> future. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Xuefu > >>>>>>> > >>>>>>> > >>>>>>> On Tue, Oct 15, 2019 at 11:05 AM Bowen Li <bowenl...@gmail.com> > >>> wrote: > >>>>>>> > >>>>>>>> Hi Zhenqiu, > >>>>>>>> > >>>>>>>> Thanks for taking on this effort! > >>>>>>>> > >>>>>>>> A couple questions: > >>>>>>>> - Though this FLIP is about function DDL, can we also think about > >>> how > >>>> the > >>>>>>>> created functions can be mapped to CatalogFunction and see if we > >>> need > >>>> to > >>>>>>>> modify CatalogFunction interface? Syntax changes need to be backed > >>> by > >>>> the > >>>>>>>> backend. > >>>>>>>> - Can we define a clearer, smaller scope targeting for Flink 1.10 > >>>> among > >>>>>>> all > >>>>>>>> the proposed changes? The current overall scope seems to be quite > >>>> wide, > >>>>>>> and > >>>>>>>> it may be unrealistic to get everything in a single release, or > >>> even a > >>>>>>>> couple. However, I believe the most common user story can be > >>>> something as > >>>>>>>> simple as "being able to create and persist a java class-based udf > >>> and > >>>>>>> use > >>>>>>>> it later in queries", which will add great value for most Flink > >>> users > >>>> and > >>>>>>>> is achievable in 1.10. > >>>>>>>> > >>>>>>>> Bowen > >>>>>>>> > >>>>>>>> On Sun, Oct 13, 2019 at 10:46 PM Peter Huang < > >>>> huangzhenqiu0...@gmail.com > >>>>>>>> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Dear Community, > >>>>>>>>> > >>>>>>>>> FLIP-79 Flink Function DDL Support > >>>>>>>>> < > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>> > >>> > >> > https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit# > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> This proposal aims to support function DDL with the consideration > >>> of > >>>>>>> SQL > >>>>>>>>> syntax, language compliance, and advanced external UDF lib > >>>>>>> registration. > >>>>>>>>> The Flink DDL is initialized and discussed in the design > >>>>>>>>> < > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>> > >>> > >> > https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil > >>>>>>>>>> > >>>>>>>>> [1] by Shuyi Chen and Timo. As the initial discussion mainly > >>> focused > >>>> on > >>>>>>>> the > >>>>>>>>> table, type and view. FLIP-69 [2] extend it with a more detailed > >>>>>>>> discussion > >>>>>>>>> of DDL for catalog, database, and function. Original the function > >>> DDL > >>>>>>> was > >>>>>>>>> under the scope of FLIP-69. After some discussion > >>>>>>>>> <https://issues.apache.org/jira/browse/FLINK-7151> with the > >>>> community, > >>>>>>>> we > >>>>>>>>> found that there are several ongoing efforts, such as FLIP-64 > >> [3], > >>>>>>>> FLIP-65 > >>>>>>>>> [4], and FLIP-78 [5]. As they will directly impact the SQL syntax > >>> of > >>>>>>>>> function DDL, the proposal wants to describe the problem clearly > >>> with > >>>>>>> the > >>>>>>>>> consideration of existing works and make sure the design aligns > >>> with > >>>>>>>>> efforts of API change of temporary objects and type inference for > >>> UDF > >>>>>>>>> defined by different languages. > >>>>>>>>> > >>>>>>>>> The FlLIP outlines the requirements from related works, and > >>> propose a > >>>>>>> SQL > >>>>>>>>> syntax to meet those requirements. The corresponding > >> implementation > >>>> is > >>>>>>>> also > >>>>>>>>> discussed. Please kindly review and give feedback. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Best Regards > >>>>>>>>> Peter Huang > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Xuefu Zhang > >>>>>>> > >>>>>>> "In Honey We Trust!" > >>>>>>> > >>>> > >>>> > >>> > >> > >