I know Hive and Spark can shadow built-in functions by temporary function. Mysql, Oracle, Sql server can not shadow. User can use full names to access functions instead of shadowing.
So I think it is a completely new thing, and the direct way to deal with new things is to add new grammar. So, +1 for #2, +0 for #3, -1 for #1 Best, Jingsong Lee ------------------------------------------------------------------ From:Kurt Young <ykt...@gmail.com> Send Time:2019年9月19日(星期四) 16:43 To:dev <dev@flink.apache.org> Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog And let me make my vote complete: -1 for #1 +1 for #2 with different keyword -0 for #3 Best, Kurt On Thu, Sep 19, 2019 at 4:40 PM Kurt Young <ykt...@gmail.com> wrote: > Looks like I'm the only person who is willing to +1 to #2 for now :-) > But I would suggest to change the keyword from GLOBAL to > something like BUILTIN. > > I think #2 and #3 are almost the same proposal, just with different > format to indicate whether it want to override built-in functions. > > My biggest reason to choose it is I want this behavior be consistent > with temporal tables. I will give some examples to show the behavior > and also make sure I'm not misunderstanding anything here. > > For most DBs, when user create a temporary table with: > > CREATE TEMPORARY TABLE t1 > > It's actually equivalent with: > > CREATE TEMPORARY TABLE `curent_db`.t1 > > If user change current database, they will not be able to access t1 without > fully qualified name, .i.e db1.t1 (assuming db1 is current database when > this temporary table is created). > > Only #2 and #3 followed this behavior and I would vote for this since this > makes such behavior consistent through temporal tables and functions. > > Why I'm not voting for #3 is a special catalog and database just looks very > hacky to me. It gave a imply that our built-in functions saved at a > special > catalog and database, which is actually not. Introducing a dedicated > keyword > like CREATE TEMPORARY BUILTIN FUNCTION looks more clear and > straightforward. One can argue that we should avoid introducing new > keyword, > but it's also very rare that a system can overwrite built-in functions. > Since we > decided to support this, introduce a new keyword is not a big deal IMO. > > Best, > Kurt > > > On Thu, Sep 19, 2019 at 3:07 PM Piotr Nowojski <pi...@ververica.com> > wrote: > >> Hi, >> >> It is a quite long discussion to follow and I hope I didn’t misunderstand >> anything. From the proposals presented by Xuefu I would vote: >> >> -1 for #1 and #2 >> +1 for #3 >> >> Besides #3 being IMO more general and more consistent, having qualified >> names (#3) would help/make easier for someone to use cross >> databases/catalogs queries (joining multiple data sets/streams). For >> example with some functions to manipulate/clean up/convert the stored data >> in different catalogs registered in the respective catalogs. >> >> Piotrek >> >> > On 19 Sep 2019, at 06:35, Jark Wu <imj...@gmail.com> wrote: >> > >> > I agree with Xuefu that inconsistent handling with all the other >> objects is >> > not a big problem. >> > >> > Regarding to option#3, the special "system.system" namespace may confuse >> > users. >> > Users need to know the set of built-in function names to know when to >> use >> > "system.system" namespace. >> > What will happen if user registers a non-builtin function name under the >> > "system.system" namespace? >> > Besides, I think it doesn't solve the "explode" problem I mentioned at >> the >> > beginning of this thread. >> > >> > So here is my vote: >> > >> > +1 for #1 >> > 0 for #2 >> > -1 for #3 >> > >> > Best, >> > Jark >> > >> > >> > On Thu, 19 Sep 2019 at 08:38, Xuefu Z <usxu...@gmail.com> wrote: >> > >> >> @Dawid, Re: we also don't need additional referencing the >> specialcatalog >> >> anywhere. >> >> >> >> True. But once we allow such reference, then user can do so in any >> possible >> >> place where a function name is expected, for which we have to handle. >> >> That's a big difference, I think. >> >> >> >> Thanks, >> >> Xuefu >> >> >> >> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz < >> >> wysakowicz.da...@gmail.com> >> >> wrote: >> >> >> >>> @Bowen I am not suggesting introducing additional catalog. I think we >> >> need >> >>> to get rid of the current built-in catalog. >> >>> >> >>> @Xuefu in option #3 we also don't need additional referencing the >> special >> >>> catalog anywhere else besides in the CREATE statement. The resolution >> >>> behaviour is exactly the same in both options. >> >>> >> >>> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <usxu...@gmail.com> wrote: >> >>> >> >>>> Hi Dawid, >> >>>> >> >>>> "GLOBAL" is a temporary keyword that was given to the approach. It >> can >> >> be >> >>>> changed to something else for better. >> >>>> >> >>>> The difference between this and the #3 approach is that we only need >> >> the >> >>>> keyword for this create DDL. For other places (such as function >> >>>> referencing), no keyword or special namespace is needed. >> >>>> >> >>>> Thanks, >> >>>> Xuefu >> >>>> >> >>>> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz < >> >>>> wysakowicz.da...@gmail.com> >> >>>> wrote: >> >>>> >> >>>>> Hi, >> >>>>> I think it makes sense to start voting at this point. >> >>>>> >> >>>>> Option 1: Only 1-part identifiers >> >>>>> PROS: >> >>>>> - allows shadowing built-in functions >> >>>>> CONS: >> >>>>> - incosistent with all the other objects, both permanent & temporary >> >>>>> - does not allow shadowing catalog functions >> >>>>> >> >>>>> Option 2: Special keyword for built-in function >> >>>>> I think this is quite similar to the special catalog/db. The thing I >> >> am >> >>>>> strongly against in this proposal is the GLOBAL keyword. This >> keyword >> >>>> has a >> >>>>> meaning in rdbms systems and means a function that is present for a >> >>>>> lifetime of a session in which it was created, but available in all >> >>> other >> >>>>> sessions. Therefore I really don't want to use this keyword in a >> >>>> different >> >>>>> context. >> >>>>> >> >>>>> Option 3: Special catalog/db >> >>>>> >> >>>>> PROS: >> >>>>> - allows shadowing built-in functions >> >>>>> - allows shadowing catalog functions >> >>>>> - consistent with other objects >> >>>>> CONS: >> >>>>> - we introduce a special namespace for built-in functions >> >>>>> >> >>>>> I don't see a problem with introducing the special namespace. In the >> >>> end >> >>>> it >> >>>>> is very similar to the keyword approach. In this case the catalog/db >> >>>>> combination would be the "keyword" >> >>>>> >> >>>>> Therefore my votes: >> >>>>> Option 1: -0 >> >>>>> Option 2: -1 (I might change to +0 if we can come up with a better >> >>>> keyword) >> >>>>> Option 3: +1 >> >>>>> >> >>>>> Best, >> >>>>> Dawid >> >>>>> >> >>>>> >> >>>>> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <usxu...@gmail.com> wrote: >> >>>>> >> >>>>>> Hi Aljoscha, >> >>>>>> >> >>>>>> Thanks for the summary and these are great questions to be >> >> answered. >> >>>> The >> >>>>>> answer to your first question is clear: there is a general >> >> agreement >> >>> to >> >>>>>> override built-in functions with temp functions. >> >>>>>> >> >>>>>> However, your second and third questions are sort of related, as a >> >>>>> function >> >>>>>> reference can be either just function name (like "func") or in the >> >>> form >> >>>>> or >> >>>>>> "cat.db.func". When a reference is just function name, it can mean >> >>>>> either a >> >>>>>> built-in function or a function defined in the current cat/db. If >> >> we >> >>>>>> support overriding a built-in function with a temp function, such >> >>>>>> overriding can also cover a function in the current cat/db. >> >>>>>> >> >>>>>> I think what Timo referred as "overriding a catalog function" >> >> means a >> >>>>> temp >> >>>>>> function defined as "cat.db.func" overrides a catalog function >> >> "func" >> >>>> in >> >>>>>> cat/db even if cat/db is not current. To support this, temp >> >> function >> >>>> has >> >>>>> to >> >>>>>> be tied to a cat/db. What's why I said above that the 2nd and 3rd >> >>>>> questions >> >>>>>> are related. The problem with such support is the ambiguity when >> >> user >> >>>>>> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func >> >>> ...". >> >>>>>> Here "func" can means a global temp function, or a temp function in >> >>>>> current >> >>>>>> cat/db. If we can assume the former, this creates an inconsistency >> >>>>> because >> >>>>>> "CREATE FUNCTION func" actually means a function in current cat/db. >> >>> If >> >>>> we >> >>>>>> assume the latter, then there is no way for user to create a global >> >>>> temp >> >>>>>> function. >> >>>>>> >> >>>>>> Giving a special namespace for built-in functions may solve the >> >>>> ambiguity >> >>>>>> problem above, but it also introduces artificial catalog/database >> >>> that >> >>>>>> needs special treatment and pollutes the cleanness of the code. I >> >>>> would >> >>>>>> rather introduce a syntax in DDL to solve the problem, like "CREATE >> >>>>>> [GLOBAL] TEMPORARY FUNCTION func". >> >>>>>> >> >>>>>> Thus, I'd like to summarize a few candidate proposals for voting >> >>>>> purposes: >> >>>>>> >> >>>>>> 1. Support only global, temporary functions without namespace. Such >> >>>> temp >> >>>>>> functions overrides built-in functions and catalog functions in >> >>> current >> >>>>>> cat/db. The resolution order is: temp functions -> built-in >> >> functions >> >>>> -> >> >>>>>> catalog functions. (Partially or fully qualified functions has no >> >>>>>> ambiguity!) >> >>>>>> >> >>>>>> 2. In addition to #1, support creating and referencing temporary >> >>>>> functions >> >>>>>> associated with a cat/db with "GLOBAL" qualifier in DDL for global >> >>> temp >> >>>>>> functions. The resolution order is: global temp functions -> >> >> built-in >> >>>>>> functions -> temp functions in current cat/db -> catalog function. >> >>>>>> (Resolution for partially or fully qualified function reference is: >> >>>> temp >> >>>>>> functions -> persistent functions.) >> >>>>>> >> >>>>>> 3. In addition to #1, support creating and referencing temporary >> >>>>> functions >> >>>>>> associated with a cat/db with a special namespace for built-in >> >>>> functions >> >>>>>> and global temp functions. The resolution is the same as #2, except >> >>>> that >> >>>>>> the special namespace might be prefixed to a reference to a >> >> built-in >> >>>>>> function or global temp function. (In absence of the special >> >>> namespace, >> >>>>> the >> >>>>>> resolution order is the same as in #2.) >> >>>>>> >> >>>>>> My personal preference is #1, given the unknown use case and >> >>> introduced >> >>>>>> complexity for #2 and #3. However, #2 is an acceptable alternative. >> >>>> Thus, >> >>>>>> my votes are: >> >>>>>> >> >>>>>> +1 for #1 >> >>>>>> +0 for #2 >> >>>>>> -1 for #3 >> >>>>>> >> >>>>>> Everyone, please cast your vote (in above format please!), or let >> >> me >> >>>> know >> >>>>>> if you have more questions or other candidates. >> >>>>>> >> >>>>>> Thanks, >> >>>>>> Xuefu >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek < >> >>> aljos...@apache.org> >> >>>>>> wrote: >> >>>>>> >> >>>>>>> Hi, >> >>>>>>> >> >>>>>>> I think this discussion and the one for FLIP-64 are very >> >> connected. >> >>>> To >> >>>>>>> resolve the differences, think we have to think about the basic >> >>>>>> principles >> >>>>>>> and find consensus there. The basic questions I see are: >> >>>>>>> >> >>>>>>> - Do we want to support overriding builtin functions? >> >>>>>>> - Do we want to support overriding catalog functions? >> >>>>>>> - And then later: should temporary functions be tied to a >> >>>>>>> catalog/database? >> >>>>>>> >> >>>>>>> I don’t have much to say about these, except that we should >> >>> somewhat >> >>>>>> stick >> >>>>>>> to what the industry does. But I also understand that the >> >> industry >> >>> is >> >>>>>>> already very divided on this. >> >>>>>>> >> >>>>>>> Best, >> >>>>>>> Aljoscha >> >>>>>>> >> >>>>>>>> On 18. Sep 2019, at 11:41, Jark Wu <imj...@gmail.com> wrote: >> >>>>>>>> >> >>>>>>>> Hi, >> >>>>>>>> >> >>>>>>>> +1 to strive for reaching consensus on the remaining topics. We >> >>> are >> >>>>>>> close to the truth. It will waste a lot of time if we resume the >> >>>> topic >> >>>>>> some >> >>>>>>> time later. >> >>>>>>>> >> >>>>>>>> +1 to “1-part/override” and I’m also fine with Timo’s >> >>> “cat.db.fun” >> >>>>> way >> >>>>>>> to override a catalog function. >> >>>>>>>> >> >>>>>>>> I’m not sure about “system.system.fun”, it introduces a >> >>> nonexistent >> >>>>> cat >> >>>>>>> & db? And we still need to do special treatment for the dedicated >> >>>>>>> system.system cat & db? >> >>>>>>>> >> >>>>>>>> Best, >> >>>>>>>> Jark >> >>>>>>>> >> >>>>>>>> >> >>>>>>>>> 在 2019年9月18日,06:54,Timo Walther <twal...@apache.org> 写道: >> >>>>>>>>> >> >>>>>>>>> Hi everyone, >> >>>>>>>>> >> >>>>>>>>> @Xuefu: I would like to avoid adding too many things >> >>>> incrementally. >> >>>>>>> Users should be able to override all catalog objects consistently >> >>>>>> according >> >>>>>>> to FLIP-64 (Support for Temporary Objects in Table module). If >> >>>>> functions >> >>>>>>> are treated completely different, we need more code and special >> >>>> cases. >> >>>>>> From >> >>>>>>> an implementation perspective, this topic only affects the lookup >> >>>> logic >> >>>>>>> which is rather low implementation effort which is why I would >> >> like >> >>>> to >> >>>>>>> clarify the remaining items. As you said, we have a slight >> >> consenus >> >>>> on >> >>>>>>> overriding built-in functions; we should also strive for reaching >> >>>>>> consensus >> >>>>>>> on the remaining topics. >> >>>>>>>>> >> >>>>>>>>> @Dawid: I like your idea as it ensures registering catalog >> >>> objects >> >>>>>>> consistent and the overriding of built-in functions more >> >> explicit. >> >>>>>>>>> >> >>>>>>>>> Thanks, >> >>>>>>>>> Timo >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On 17.09.19 11:59, kai wang wrote: >> >>>>>>>>>> hi, everyone >> >>>>>>>>>> I think this flip is very meaningful. it supports functions >> >>> that >> >>>>> can >> >>>>>> be >> >>>>>>>>>> shared by different catalogs and dbs, reducing the >> >> duplication >> >>> of >> >>>>>>> functions. >> >>>>>>>>>> >> >>>>>>>>>> Our group based on flink's sql parser module implements >> >> create >> >>>>>> function >> >>>>>>>>>> feature, stores the parsed function metadata and schema into >> >>>> mysql, >> >>>>>> and >> >>>>>>>>>> also customizes the catalog, customizes sql-client to support >> >>>>> custom >> >>>>>>>>>> schemas and functions. Loaded, but the function is currently >> >>>>> global, >> >>>>>>> and is >> >>>>>>>>>> not subdivided according to catalog and db. >> >>>>>>>>>> >> >>>>>>>>>> In addition, I very much hope to participate in the >> >> development >> >>>> of >> >>>>>> this >> >>>>>>>>>> flip, I have been paying attention to the community, but >> >> found >> >>> it >> >>>>> is >> >>>>>>> more >> >>>>>>>>>> difficult to join. >> >>>>>>>>>> thank you. >> >>>>>>>>>> >> >>>>>>>>>> Xuefu Z <usxu...@gmail.com> 于2019年9月17日周二 上午11:19写道: >> >>>>>>>>>> >> >>>>>>>>>>> Thanks to Tmo and Dawid for sharing thoughts. >> >>>>>>>>>>> >> >>>>>>>>>>> It seems to me that there is a general consensus on having >> >>> temp >> >>>>>>> functions >> >>>>>>>>>>> that have no namespaces and overwrite built-in functions. >> >> (As >> >>> a >> >>>>> side >> >>>>>>> note >> >>>>>>>>>>> for comparability, the current user defined functions are >> >> all >> >>>>>>> temporary and >> >>>>>>>>>>> having no namespaces.) >> >>>>>>>>>>> >> >>>>>>>>>>> Nevertheless, I can also see the merit of having namespaced >> >>> temp >> >>>>>>> functions >> >>>>>>>>>>> that can overwrite functions defined in a specific cat/db. >> >>>>> However, >> >>>>>>> this >> >>>>>>>>>>> idea appears orthogonal to the former and can be added >> >>>>>> incrementally. >> >>>>>>>>>>> >> >>>>>>>>>>> How about we first implement non-namespaced temp functions >> >> now >> >>>> and >> >>>>>>> leave >> >>>>>>>>>>> the door open for namespaced ones for later releases as the >> >>>>>>> requirement >> >>>>>>>>>>> might become more crystal? This also helps shorten the >> >> debate >> >>>> and >> >>>>>>> allow us >> >>>>>>>>>>> to make some progress along this direction. >> >>>>>>>>>>> >> >>>>>>>>>>> As to Dawid's idea of having a dedicated cat/db to host the >> >>>>>> temporary >> >>>>>>> temp >> >>>>>>>>>>> functions that don't have namespaces, my only concern is the >> >>>>> special >> >>>>>>>>>>> treatment for a cat/db, which makes code less clean, as >> >>> evident >> >>>> in >> >>>>>>> treating >> >>>>>>>>>>> the built-in catalog currently. >> >>>>>>>>>>> >> >>>>>>>>>>> Thanks, >> >>>>>>>>>>> Xuefiu >> >>>>>>>>>>> >> >>>>>>>>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz < >> >>>>>>>>>>> wysakowicz.da...@gmail.com> >> >>>>>>>>>>> wrote: >> >>>>>>>>>>> >> >>>>>>>>>>>> Hi, >> >>>>>>>>>>>> Another idea to consider on top of Timo's suggestion. How >> >>> about >> >>>>> we >> >>>>>>> have a >> >>>>>>>>>>>> special namespace (catalog + database) for built-in >> >> objects? >> >>>> This >> >>>>>>> catalog >> >>>>>>>>>>>> would be invisible for users as Xuefu was suggesting. >> >>>>>>>>>>>> >> >>>>>>>>>>>> Then users could still override built-in functions, if they >> >>>> fully >> >>>>>>> qualify >> >>>>>>>>>>>> object with the built-in namespace, but by default the >> >> common >> >>>>> logic >> >>>>>>> of >> >>>>>>>>>>>> current dB & cat would be used. >> >>>>>>>>>>>> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION func ... >> >>>>>>>>>>>> registers temporary function in current cat & dB >> >>>>>>>>>>>> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.func ... >> >>>>>>>>>>>> registers temporary function in cat db >> >>>>>>>>>>>> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION system.system.func ... >> >>>>>>>>>>>> Overrides built-in function with temporary function >> >>>>>>>>>>>> >> >>>>>>>>>>>> The built-in/system namespace would not be writable for >> >>>> permanent >> >>>>>>>>>>> objects. >> >>>>>>>>>>>> WDYT? >> >>>>>>>>>>>> >> >>>>>>>>>>>> This way I think we can have benefits of both solutions. >> >>>>>>>>>>>> >> >>>>>>>>>>>> Best, >> >>>>>>>>>>>> Dawid >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, < >> >> twal...@apache.org >> >>>> >> >>>>>> wrote: >> >>>>>>>>>>>> >> >>>>>>>>>>>>> Hi Bowen, >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> I understand the potential benefit of overriding certain >> >>>>> built-in >> >>>>>>>>>>>>> functions. I'm open to such a feature if many people >> >> agree. >> >>>>>>> However, it >> >>>>>>>>>>>>> would be great to still support overriding catalog >> >> functions >> >>>>> with >> >>>>>>>>>>>>> temporary functions in order to prototype a query even >> >>> though >> >>>> a >> >>>>>>>>>>>>> catalog/database might not be available currently or >> >> should >> >>>> not >> >>>>> be >> >>>>>>>>>>>>> modified yet. How about we support both cases? >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION abs >> >>>>>>>>>>>>> -> creates/overrides a built-in function and never >> >>> consideres >> >>>>>>> current >> >>>>>>>>>>>>> catalog and database; inconsistent with other DDL but >> >>>> acceptable >> >>>>>> for >> >>>>>>>>>>>>> functions I guess. >> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun >> >>>>>>>>>>>>> -> creates/overrides a catalog function >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Regarding "Flink don't have any other built-in objects >> >>>> (tables, >> >>>>>>> views) >> >>>>>>>>>>>>> except functions", this might change in the near future. >> >>> Take >> >>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an >> >>>>> example. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Thanks, >> >>>>>>>>>>>>> Timo >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> On 14.09.19 01:40, Bowen Li wrote: >> >>>>>>>>>>>>>> Hi Fabian, >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Yes, I agree 1-part/no-override is the least favorable >> >>> thus I >> >>>>>>> didn't >> >>>>>>>>>>>>>> include that as a voting option, and the discussion is >> >>> mainly >> >>>>>>> between >> >>>>>>>>>>>>>> 1-part/override builtin and 3-part/not override builtin. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Re > However, it means that temp functions are >> >> differently >> >>>>>> treated >> >>>>>>>>>>> than >> >>>>>>>>>>>>>> other db objects. >> >>>>>>>>>>>>>> IMO, the treatment difference results from the fact that >> >>>>>> functions >> >>>>>>>>>>> are >> >>>>>>>>>>>> a >> >>>>>>>>>>>>>> bit different from other objects - Flink don't have any >> >>> other >> >>>>>>>>>>> built-in >> >>>>>>>>>>>>>> objects (tables, views) except functions. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Cheers, >> >>>>>>>>>>>>>> Bowen >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> -- >> >>>>>>>>>>> Xuefu Zhang >> >>>>>>>>>>> >> >>>>>>>>>>> "In Honey We Trust!" >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>> >> >>>>>> -- >> >>>>>> Xuefu Zhang >> >>>>>> >> >>>>>> "In Honey We Trust!" >> >>>>>> >> >>>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Xuefu Zhang >> >>>> >> >>>> "In Honey We Trust!" >> >>>> >> >>> >> >> >> >> >> >> -- >> >> Xuefu Zhang >> >> >> >> "In Honey We Trust!" >> >> >> >>