Hi, Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they are very inconsistent in that manner (spark being way worse on that).
Hive: You cannot overwrite all the built-in functions. I could overwrite most of the functions I tried e.g. length, e, pi, round, rtrim, but there are functions I cannot overwrite e.g. CAST, ARRAY I get: / ParseException line 1:29 cannot recognize input near 'array' 'AS' / What is interesting is that I cannot ovewrite /array/, but I can ovewrite /map/ or /struct/. Though hive behaves reasonable well if I manage to overwrite a function. When I drop the temporary function the native function is still available. Spark: Spark's behavior imho is super bad. Theoretically I could overwrite all functions. I was able e.g. to overwrite CAST function. I had to use though CREATE OR REPLACE TEMPORARY FUNCTION syntax. Otherwise I get an exception that a function already exists. However when I used the CAST function in a query it used the native, built-in one. When I overwrote current_date() function, it was used in a query, but it completely replaces the built-in function and I can no longer use the native function in any way. I cannot also drop the temporary function. I get: / Error in query: Cannot drop native function 'current_date';/ Additional note, both systems do not allow creating TEMPORARY FUNCTIONS with a database. Temporary functions are always represented as a single name. In my opinion neither of the systems have consistent behavior. Generally speaking I think overwriting any system provided functions is just dangerous. Regarding Jark's concerns. Such functions would be registered in a current catalog/database schema, so a user could still use its own function, but would have to fully qualify the function (because built-in functions take precedence). Moreover users would have the same problem with permanent functions. Imagine a user have a permanent function 'cat.db.explode'. In 1.9 the user could use just the 'explode' function as long as the 'cat' & 'db' were the default catalog & database. If we introduce 'explode' built-in function in 1.10, the user has to fully qualify the function. Best, Dawid On 04/09/2019 15:19, Timo Walther wrote: > Hi all, > > thanks for the healthy discussion. It is already a very long > discussion with a lot of text. So I will just post my opinion to a > couple of statements: > > > Hive built-in functions are not part of Flink built-in functions, > they are catalog functions > > That is not entirely true. Correct me if I'm wrong but I think Hive > built-in functions are also not catalog functions. They are not stored > in every Hive metastore catalog that is freshly created but are a set > of functions that are listed somewhere and made available. > > > ambiguous functions reference just shouldn't be resolved to a > different catalog > > I agree. They should not be resolved to a different catalog. That's > why I am suggesting to split the concept of built-in functions and > catalog lookup semantics. > > > I don't know if any other databases handle built-in functions like that > > What I called "module" is: > - Extension in Postgres [1] > - Plugin in Presto [2] > > Btw. Presto even mentions example modules that are similar to the ones > that we will introduce in the near future both for ML and System XYZ > compatibility: > "See either the presto-ml module for machine learning functions or the > presto-teradata-functions module for Teradata-compatible functions, > both in the root of the Presto source." > > > functions should be either built-in already or just libraries > functions, and library functions can be adapted to catalog APIs or of > some other syntax to use > > Regarding "built-in already", of course we can add a lot of functions > as built-ins but we will end-up in a dependency hell in the near > future if we don't introduce a pluggable approach. Library functions > is what you also suggest but storing them in a catalog means to always > fully qualify them or modifying the existing catalog design that was > inspired by the standard. > > I don't think "it brings in even more complicated scenarios to the > design", it just does clear separation of concerns. Integrating the > functionality into the current design makes the catalog API more > complicated. > > > why would users name a temporary function the same as a built-in > function then? > > Because you never know what users do. If they don't, my suggested > resolution order should not be a problem, right? > > > I don't think hive functions deserves be a function module > > Our goal is not to create a Hive clone. We need to think forward and > Hive is just one of many systems that we can support. Not every > built-in function behaves and will behave exactly like Hive. > > > regarding temporary functions, there are few systems that support it > > IMHO Spark and Hive are not always the best examples for consistent > design. Systems like Postgres, Presto, or SQL Server should be used as > a reference. I don't think that a user can overwrite a built-in > function there. > > Regards, > Timo > > [1] https://www.postgresql.org/docs/10/extend-extensions.html > [2] https://prestodb.github.io/docs/current/develop/functions.html > > > On 04.09.19 13:44, Jark Wu wrote: >> Hi all, >> >> Regarding #1 temp function <> built-in function and naming. >> I'm fine with temp functions should precede built-in function and can >> override built-in functions (we already support to override built-in >> function in 1.9). >> If we don't allow the same name as a built-in function, I'm afraid we >> will >> have compatibility issues in the future. >> Say users register a user defined function named "explode" in 1.9, >> and we >> support a built-in "explode" function in 1.10. >> Then the user's jobs which call the registered "explode" function in 1.9 >> will all fail in 1.10 because of naming conflict. >> >> Regarding #2 "External" built-in functions. >> I think if we store external built-in functions in catalog, then >> "hive1::sqrt" is a good way to go. >> However, I would prefer to support a discovery mechanism (e.g. SPI) for >> built-in functions as Timo suggested above. >> This gives us the flexibility to add Hive or MySQL or Geo or whatever >> function set as built-in functions in an easy way. >> >> Best, >> Jark >> >> On Wed, 4 Sep 2019 at 17:47, Xuefu Z <usxu...@gmail.com> wrote: >> >>> Hi David, >>> >>> Thank you for sharing your findings. It seems to me that there is no >>> SQL >>> standard regarding temporary functions. There are few systems that >>> support >>> it. Here are what I have found: >>> >>> 1. Hive: no DB qualifier allowed. Can overwrite built-in. >>> 2. Spark: basically follows Hive ( >>> >>> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html >>> >>> ) >>> 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of >>> overwriting >>> behavior. ( >>> http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html) >>> >>> >>> Because of lack of standard, it's perfectly fine for Flink to define >>> whatever it sees appropriate. Thus, your proposal (no overwriting >>> and must >>> have DB as holder) is one option. The advantage is simplicity, The >>> downside >>> is the deviation from Hive, which is popular and de facto standard >>> in big >>> data world. >>> >>> However, I don't think we have to follow Hive. More importantly, we >>> need a >>> consensus. I have no objection if your proposal is generally agreed >>> upon. >>> >>> Thanks, >>> Xuefu >>> >>> On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz >>> <dwysakow...@apache.org> >>> wrote: >>> >>>> Hi all, >>>> >>>> Just an opinion on the built-in <> temporary functions resolution and >>>> NAMING issue. I think we should not allow overriding the built-in >>>> functions, as this may pose serious issues and to be honest is rather >>>> not feasible and would require major rework. What happens if a user >>>> wants to override CAST? Calls to that function are generated at >>>> different layers of the stack that unfortunately does not always go >>>> through the Catalog API (at least yet). Moreover from what I've >>>> checked >>>> no other systems allow overriding the built-in functions. All the >>>> systems I've checked so far register temporary functions in a >>>> database/schema (either special database for temporary functions, or >>>> just current database). What I would suggest is to always register >>>> temporary functions with a 3 part identifier. The same way as tables, >>>> views etc. This effectively means you cannot override built-in >>>> functions. With such approach it is natural that the temporary >>>> functions >>>> end up a step lower in the resolution order: >>>> >>>> 1. built-in functions (1 part, maybe 2? - this is still under >>>> discussion) >>>> >>>> 2. temporary functions (always 3 part path) >>>> >>>> 3. catalog functions (always 3 part path) >>>> >>>> Let me know what do you think. >>>> >>>> Best, >>>> >>>> Dawid >>>> >>>> On 04/09/2019 06:13, Bowen Li wrote: >>>>> Hi, >>>>> >>>>> I agree with Xuefu that the main controversial points are mainly the >>> two >>>>> places. My thoughts on them: >>>>> >>>>> 1) Determinism of referencing Hive built-in functions. We can either >>>> remove >>>>> Hive built-in functions from ambiguous function resolution and >>>>> require >>>>> users to use special syntax for their qualified names, or add a >>>>> config >>>> flag >>>>> to catalog constructor/yaml for turning on and off Hive built-in >>>> functions >>>>> with the flag set to 'false' by default and proper doc added to help >>>> users >>>>> make their decisions. >>>>> >>>>> 2) Flink temp functions v.s. Flink built-in functions in ambiguous >>>> function >>>>> resolution order. We believe Flink temp functions should precede >>>>> Flink >>>>> built-in functions, and I have presented my reasons. Just in case >>>>> if we >>>>> cannot reach an agreement, I propose forbid users registering temp >>>>> functions in the same name as a built-in function, like MySQL's >>> approach, >>>>> for the moment. It won't have any performance concern, since built-in >>>>> functions are all in memory and thus cost of a name check will be >>> really >>>>> trivial. >>>>> >>>>> >>>>> On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <usxu...@gmail.com> wrote: >>>>> >>>>>> From what I have seen, there are a couple of focal disagreements: >>>>>> >>>>>> 1. Resolution order: temp function --> flink built-in function --> >>>> catalog >>>>>> function vs flink built-in function --> temp function -> catalog >>>> function. >>>>>> 2. "External" built-in functions: how to treat built-in functions in >>>>>> external system and how users reference them >>>>>> >>>>>> For #1, I agree with Bowen that temp function needs to be at the >>> highest >>>>>> priority because that's how a user might overwrite a built-in >>>>>> function >>>>>> without referencing a persistent, overwriting catalog function >>>>>> with a >>>> fully >>>>>> qualified name. Putting built-in functions at the highest priority >>>>>> eliminates that usage. >>>>>> >>>>>> For #2, I saw a general agreement on referencing "external" built-in >>>>>> functions such as those in Hive needs to be explicit and >>>>>> deterministic >>>> even >>>>>> though different approaches are proposed. To limit the scope and >>> simply >>>> the >>>>>> usage, it seems making sense to me to introduce special syntax for >>>> user to >>>>>> explicitly reference an external built-in function such as >>>>>> hive1::sqrt >>>> or >>>>>> hive1._built_in.sqrt. This is a DML syntax matching nicely >>>>>> Catalog API >>>> call >>>>>> hive1.getFunction(ObjectPath functionName) where the database >>>>>> name is >>>>>> absent for bulit-in functions available in that catalog hive1. I >>>> understand >>>>>> that Bowen's original proposal was trying to avoid this, but this >>> could >>>>>> turn out to be a clean and simple solution. >>>>>> >>>>>> (Timo's modular approach is great way to "expand" Flink's built-in >>>> function >>>>>> set, which seems orthogonal and complementary to this, which >>>>>> could be >>>>>> tackled in further future work.) >>>>>> >>>>>> I'd be happy to hear further thoughts on the two points. >>>>>> >>>>>> Thanks, >>>>>> Xuefu >>>>>> >>>>>> On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <ykt...@gmail.com> wrote: >>>>>> >>>>>>> Thanks Timo & Bowen for the feedback. Bowen was right, my >>>>>>> proposal is >>>> the >>>>>>> same >>>>>>> as Bowen's. But after thinking about it, I'm currently lean to >>>>>>> Timo's >>>>>>> suggestion. >>>>>>> >>>>>>> The reason is backward compatibility. If we follow Bowen's >>>>>>> approach, >>>>>> let's >>>>>>> say we >>>>>>> first find function in Flink's built-in functions, and then hive's >>>>>>> built-in. For example, `foo` >>>>>>> is not supported by Flink, but hive has such built-in function. So >>> user >>>>>>> will have hive's >>>>>>> behavior for function `foo`. And in next release, Flink realize >>>>>>> this >>>> is a >>>>>>> very popular function >>>>>>> and add it into Flink's built-in functions, but with different >>> behavior >>>>>> as >>>>>>> hive's. So in next >>>>>>> release, the behavior changes. >>>>>>> >>>>>>> With Timo's approach, IIUC user have to tell the framework >>>>>>> explicitly >>>>>> what >>>>>>> kind of >>>>>>> built-in functions he would like to use. He can just tell framework >>> to >>>>>>> abandon Flink's built-in >>>>>>> functions, and use hive's instead. User can only choose between >>>>>>> them, >>>> but >>>>>>> not use >>>>>>> them at the same time. I think this approach is more predictable. >>>>>>> >>>>>>> Best, >>>>>>> Kurt >>>>>>> >>>>>>> >>>>>>> On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bowenl...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> Thanks for the feedback. Just a kindly reminder that the >>>>>>>> [Proposal] >>>>>>> section >>>>>>>> in the google doc was updated, please take a look first and let me >>>> know >>>>>>> if >>>>>>>> you have more questions. >>>>>>>> >>>>>>>> On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bowenl...@gmail.com> >>> wrote: >>>>>>>>> Hi Timo, >>>>>>>>> >>>>>>>>> Re> 1) We should not have the restriction "hive built-in >>>>>>>>> functions >>>>>> can >>>>>>>>> only >>>>>>>>>> be used when current catalog is hive catalog". Switching a >>>>>>>>>> catalog >>>>>>>>>> should only have implications on the cat.db.object resolution >>>>>>>>>> but >>>>>> not >>>>>>>>>> functions. It would be quite convinient for users to use Hive >>>>>>> built-ins >>>>>>>>>> even if they use a Confluent schema registry or just the >>>>>>>>>> in-memory >>>>>>>>> catalog. >>>>>>>>> >>>>>>>>> There might be a misunderstanding here. >>>>>>>>> >>>>>>>>> First of all, Hive built-in functions are not part of Flink >>> built-in >>>>>>>>> functions, they are catalog functions, thus if the current >>>>>>>>> catalog >>> is >>>>>>>> not a >>>>>>>>> HiveCatalog but, say, a schema registry catalog, ambiguous >>> functions >>>>>>>>> reference just shouldn't be resolved to a different catalog. >>>>>>>>> >>>>>>>>> Second, Hive built-in functions can potentially be referenced >>> across >>>>>>>>> catalog, but it doesn't have db namespace and we currently just >>> don't >>>>>>>> have >>>>>>>>> a SQL syntax for it. It can be enabled when such a SQL syntax is >>>>>>> defined, >>>>>>>>> e.g. "catalog::function", but it's out of scope of this FLIP. >>>>>>>>> >>>>>>>>> 2) I would propose to have separate concepts for catalog and >>> built-in >>>>>>>>> functions. In particular it would be nice to modularize built-in >>>>>>>>> functions. Some built-in functions are very crucial (like AS, >>>>>>>>> CAST, >>>>>>>>> MINUS), others are more optional but stable (MD5, CONCAT_WS), and >>>>>> maybe >>>>>>>>> we add more experimental functions in the future or function for >>> some >>>>>>>>> special application area (Geo functions, ML functions). A data >>>>>> platform >>>>>>>>> team might not want to make every built-in function available. >>>>>>>>> Or a >>>>>>>>> function module like ML functions is in a different Maven module. >>>>>>>>> >>>>>>>>> I think this is orthogonal to this FLIP, especially we don't have >>> the >>>>>>>>> "external built-in functions" anymore and currently the built-in >>>>>>> function >>>>>>>>> category remains untouched. >>>>>>>>> >>>>>>>>> But just to share some thoughts on the proposal, I'm not sure >>>>>>>>> about >>>>>> it: >>>>>>>>> - I don't know if any other databases handle built-in functions >>> like >>>>>>>> that. >>>>>>>>> Maybe you can give some examples? IMHO, built-in functions are >>> system >>>>>>>> info >>>>>>>>> and should be deterministic, not depending on loaded >>>>>>>>> libraries. Geo >>>>>>>>> functions should be either built-in already or just libraries >>>>>>> functions, >>>>>>>>> and library functions can be adapted to catalog APIs or of some >>> other >>>>>>>>> syntax to use >>>>>>>>> - I don't know if all use cases stand, and many can be >>>>>>>>> achieved by >>>>>>> other >>>>>>>>> approaches too. E.g. experimental functions can be taken good >>>>>>>>> care >>> of >>>>>>> by >>>>>>>>> documentations, annotations, etc >>>>>>>>> - the proposal basically introduces some concept like a pluggable >>>>>>>> built-in >>>>>>>>> function catalog, despite the already existing catalog APIs >>>>>>>>> - it brings in even more complicated scenarios to the design. >>>>>>>>> E.g. >>>>>> how >>>>>>> do >>>>>>>>> you handle built-in functions in different modules but different >>>>>> names? >>>>>>>>> In short, I'm not sure if it really stands and it looks like an >>>>>>> overkill >>>>>>>>> to me. I'd rather not go to that route. Related discussion can be >>> on >>>>>>> its >>>>>>>>> own thread. >>>>>>>>> >>>>>>>>> 3) Following the suggestion above, we can have a separate >>>>>>>>> discovery >>>>>>>>> mechanism for built-in functions. Instead of just going through a >>>>>>> static >>>>>>>>> list like in BuiltInFunctionDefinitions, a platform team >>>>>>>>> should be >>>>>> able >>>>>>>>> to select function modules like >>>>>>>>> catalogManager.setFunctionModules(CoreFunctions, GeoFunctions, >>>>>>>>> HiveFunctions) or via service discovery; >>>>>>>>> >>>>>>>>> Same as above. I'll leave it to its own thread. >>>>>>>>> >>>>>>>>> re > 3) Dawid and I discussed the resulution order again. I agree >>>>>> with >>>>>>>>> Kurt >>>>>>>>>> that we should unify built-in function (external or internal) >>>>>> under a >>>>>>>>>> common layer. However, the resolution order should be: >>>>>>>>>> 1. built-in functions >>>>>>>>>> 2. temporary functions >>>>>>>>>> 3. regular catalog resolution logic >>>>>>>>>> Otherwise a temporary function could cause clashes with Flink's >>>>>>>> built-in >>>>>>>>>> functions. If you take a look at other vendors, like SQL Server >>>>>> they >>>>>>>>>> also do not allow to overwrite built-in functions. >>>>>>>>> ”I agree with Kurt that we should unify built-in function >>>>>>>>> (external >>>>>> or >>>>>>>>> internal) under a common layer.“ <- I don't think this is what >>>>>>>>> Kurt >>>>>>>> means. >>>>>>>>> Kurt and I are in favor of unifying built-in functions of >>>>>>>>> external >>>>>>>> systems >>>>>>>>> and catalog functions. Did you type a mistake? >>>>>>>>> >>>>>>>>> Besides, I'm not sure about the resolution order you proposed. >>>>>>> Temporary >>>>>>>>> functions have a lifespan over a session and are only visible to >>> the >>>>>>>>> session owner, they are unique to each user, and users create >>>>>>>>> them >>> on >>>>>>>>> purpose to be the highest priority in order to overwrite system >>> info >>>>>>>>> (built-in functions in this case). >>>>>>>>> >>>>>>>>> In your case, why would users name a temporary function the same >>> as a >>>>>>>>> built-in function then? Since using that name in ambiguous >>>>>>>>> function >>>>>>>>> reference will always be resolved to built-in functions, >>>>>>>>> creating a >>>>>>>>> same-named temp function would be meaningless in the end. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bowenl...@gmail.com> >>> wrote: >>>>>>>>>> Hi Jingsong, >>>>>>>>>> >>>>>>>>>> Re> 1.Hive built-in functions is an intermediate solution. So we >>>>>>> should >>>>>>>>>>> not introduce interfaces to influence the framework. To make >>>>>>>>>>> Flink itself more powerful, we should implement the functions >>>>>>>>>>> we need to add. >>>>>>>>>> Yes, please see the doc. >>>>>>>>>> >>>>>>>>>> Re> 2.Non-flink built-in functions are easy for users to change >>>>>> their >>>>>>>>>>> behavior. If we support some flink built-in functions in the >>>>>>>>>>> future but act differently from non-flink built-in, this will >>> lead >>>>>>> to >>>>>>>>>>> changes in user behavior. >>>>>>>>>> There's no such concept as "external built-in functions" any >>>>>>>>>> more. >>>>>>>>>> Built-in functions of external systems will be treated as >>>>>>>>>> special >>>>>>>> catalog >>>>>>>>>> functions. >>>>>>>>>> >>>>>>>>>> Re> Another question is, does this fallback include all >>>>>>>>>>> hive built-in functions? As far as I know, some hive functions >>>>>>>>>>> have some hacky. If possible, can we start with a white list? >>>>>>>>>>> Once we implement some functions to flink built-in, we can >>>>>>>>>>> also update the whitelist. >>>>>>>>>> Yes, that's something we thought of too. I don't think it's >>>>>>>>>> super >>>>>>>>>> critical to the scope of this FLIP, thus I'd like to leave it to >>>>>>> future >>>>>>>>>> efforts as a nice-to-have feature. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bowenl...@gmail.com> >>>>>> wrote: >>>>>>>>>>> Hi Kurt, >>>>>>>>>>> >>>>>>>>>>> Re: > What I want to propose is we can merge #3 and #4, make >>>>>>>>>>> them >>>>>>> both >>>>>>>>>>> under >>>>>>>>>>>> "catalog" concept, by extending catalog function to make it >>>>>>>>>>>> have >>>>>>>>>>> ability to >>>>>>>>>>>> have built-in catalog functions. Some benefits I can see from >>> this >>>>>>>>>>> approach: >>>>>>>>>>>> 1. We don't have to introduce new concept like external >>>>>>>>>>>> built-in >>>>>>>>>>> functions. >>>>>>>>>>>> Actually I don't see a full story about how to treat a >>>>>>>>>>>> built-in >>>>>>>>>>> functions, and it >>>>>>>>>>>> seems a little bit disrupt with catalog. As a result, you have >>> to >>>>>>> make >>>>>>>>>>> some restriction >>>>>>>>>>>> like "hive built-in functions can only be used when current >>>>>> catalog >>>>>>> is >>>>>>>>>>> hive catalog". >>>>>>>>>>> >>>>>>>>>>> Yes, I've unified #3 and #4 but it seems I didn't update some >>> part >>>>>> of >>>>>>>>>>> the doc. I've modified those sections, and they are up to date >>> now. >>>>>>>>>>> In short, now built-in function of external systems are defined >>> as >>>>>> a >>>>>>>>>>> special kind of catalog function in Flink, and handled by Flink >>> as >>>>>>>>>>> following: >>>>>>>>>>> - An external built-in function must be associated with a >>>>>>>>>>> catalog >>>>>> for >>>>>>>>>>> the purpose of decoupling flink-table and external systems. >>>>>>>>>>> - It always resides in front of catalog functions in ambiguous >>>>>>> function >>>>>>>>>>> reference order, just like in its own external system >>>>>>>>>>> - It is a special catalog function that doesn’t have a >>>>>>> schema/database >>>>>>>>>>> namespace >>>>>>>>>>> - It goes thru the same instantiation logic as other user >>>>>>>>>>> defined >>>>>>>>>>> catalog functions in the external system >>>>>>>>>>> >>>>>>>>>>> Please take another look at the doc, and let me know if you >>>>>>>>>>> have >>>>>> more >>>>>>>>>>> questions. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther >>>>>>>>>>> <twal...@apache.org> >>>>>>>> wrote: >>>>>>>>>>>> Hi Kurt, >>>>>>>>>>>> >>>>>>>>>>>> it should not affect the functions and operations we currently >>>>>> have >>>>>>> in >>>>>>>>>>>> SQL. It just categorizes the available built-in functions. >>>>>>>>>>>> It is >>>>>>> kind >>>>>>>>>>>> of >>>>>>>>>>>> an orthogonal concept to the catalog API but built-in >>>>>>>>>>>> functions >>>>>>>> deserve >>>>>>>>>>>> this special kind of treatment. CatalogFunction still fits >>>>>> perfectly >>>>>>>> in >>>>>>>>>>>> there because the regular catalog object resolution logic >>>>>>>>>>>> is not >>>>>>>>>>>> affected. So tables and functions are resolved in the same way >>> but >>>>>>>> with >>>>>>>>>>>> built-in functions that have priority as in the original >>>>>>>>>>>> design. >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Timo >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 03.09.19 15:26, Kurt Young wrote: >>>>>>>>>>>>> Does this only affect the functions and operations we >>>>>>>>>>>>> currently >>>>>>> have >>>>>>>>>>>> in SQL >>>>>>>>>>>>> and >>>>>>>>>>>>> have no effect on tables, right? Looks like this is an >>>>>> orthogonal >>>>>>>>>>>> concept >>>>>>>>>>>>> with Catalog? >>>>>>>>>>>>> If the answer are both yes, then the catalog function will >>>>>>>>>>>>> be a >>>>>>>> weird >>>>>>>>>>>>> concept? >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Kurt >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Sep 3, 2019 at 8:10 PM Danny Chan < >>> yuzhao....@gmail.com >>>>>>>>>>>> wrote: >>>>>>>>>>>>>> The way you proposed are basically the same as what Calcite >>>>>>> does, I >>>>>>>>>>>> think >>>>>>>>>>>>>> we are in the same line. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> Danny Chan >>>>>>>>>>>>>> 在 2019年9月3日 +0800 PM7:57,Timo Walther >>>>>>>>>>>>>> <twal...@apache.org >>>> ,写道: >>>>>>>>>>>>>>> This sounds exactly as the module approach I mentioned, no? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 03.09.19 13:42, Danny Chan wrote: >>>>>>>>>>>>>>>> Thanks Bowen for bring up this topic, I think it’s a >>>>>>>>>>>>>>>> useful >>>>>>>>>>>>>> refactoring to make our function usage more user friendly. >>>>>>>>>>>>>>>> For the topic of how to organize the builtin operators and >>>>>>>>>>>> operators >>>>>>>>>>>>>> of Hive, here is a solution from Apache Calcite, the Calcite >>>>>> way >>>>>>> is >>>>>>>>>>>> to make >>>>>>>>>>>>>> every dialect operators a “Library”, user can specify which >>>>>>>>>>>> libraries they >>>>>>>>>>>>>> want to use for a sql query. The builtin operators always >>> comes >>>>>>> as >>>>>>>>>>>> the >>>>>>>>>>>>>> first class objects and the others are used from the order >>> they >>>>>>>>>>>> appears. >>>>>>>>>>>>>> Maybe you can take a reference. >>>>>>>>>>>>>>>> [1] >>> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28 >>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> Danny Chan >>>>>>>>>>>>>>>> 在 2019年8月28日 +0800 AM2:50,Bowen Li >>>>>>>>>>>>>>>> <bowenl...@gmail.com >>>> ,写道: >>>>>>>>>>>>>>>>> Hi folks, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'd like to kick off a discussion on reworking Flink's >>>>>>>>>>>>>> FunctionCatalog. >>>>>>>>>>>>>>>>> It's critically helpful to improve function usability in >>>>>> SQL. >>> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing >>> >>>>>>>>>>>>>>>>> In short, it: >>>>>>>>>>>>>>>>> - adds support for precise function reference with >>>>>>>> fully/partially >>>>>>>>>>>>>>>>> qualified name >>>>>>>>>>>>>>>>> - redefines function resolution order for ambiguous >>> function >>>>>>>>>>>>>> reference >>>>>>>>>>>>>>>>> - adds support for Hive's rich built-in functions >>>>>>>>>>>>>>>>> (support >>>>>> for >>>>>>>>>>>> Hive >>>>>>>>>>>>>> user >>>>>>>>>>>>>>>>> defined functions was already added in 1.9.0) >>>>>>>>>>>>>>>>> - clarifies the concept of temporary functions >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Would love to hear your thoughts. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Bowen >>>>>> -- >>>>>> Xuefu Zhang >>>>>> >>>>>> "In Honey We Trust!" >>>>>> >>>> >>> -- >>> Xuefu Zhang >>> >>> "In Honey We Trust!" >>> >
signature.asc
Description: OpenPGP digital signature