Hi Xuefu, Thank you for your answers.
Let me summarize my understanding. In principle we differ only in regards to the fact if a temporary function can be only 1-part or only 3-part identified. I can reconfirm that if the community decides it prefers the 1-part approach I will commit to that, with the assumption that we will force ONLY 1-part function names. (We will parse identifier and throw exception if a user tries to register e.g. db.temp_func). My preference is though the 3-part approach: * there are some functions that it makes no sense to override, e.g. CAST, moreover I'm afraid that allowing overriding such will lead to high inconsistency, similar to those that I mentioned spark has * you cannot shadow a fully-qualified function. (If a user fully qualifies his/her objects in a SQL query, which is often considered a good practice) * it does not differentiate between functions & temporary functions. Temporary functions just differ with regards to their life-cycle. The registration & usage is exactly the same. As it can be seen, the proposed concept regarding temp function and function resolution is quite simple. Both approaches are equally simple. I would even say the 3-part approach is slightly simpler as it does not have to care about some special built-in functions such as CAST. I don't want to express my opinion on the differentiation between built-in functions and "external" built-in functions in this thread as it is rather orthogonal, but I also like the modular approach and I definitely don't like the special syntax "cat::function". I think it's better to stick to a standard or at least other proved solutions from other systems. Best, Dawid On 05/09/2019 10:12, Xuefu Z wrote: > Hi David, > > Thanks for sharing your thoughts and request for clarifications. I believe > that I fully understood your proposal, which does has its merit. However, > it's different from ours. Here are the answers to your questions: > > Re #1: yes, the temp functions in the proposal are global and have just > one-part names, similar to built-in functions. Two- or three-part names are > not allowed. > > Re #2: not applicable as two- or three-part names are disallowed. > > Re #3: same as above. Referencing external built-in functions is achieved > either implicitly (only the built-in functions in the current catalogs are > considered) or via special syntax such as cat::function. However, we are > looking into the modular approach that Time suggested with other feedback > received from the community. > > Re #4: the resolution order goes like the following in our proposal: > > 1. temporary functions > 2. bulit-in functions (including those augmented by add-on modules) > 3. built-in functions in current catalog (this will not be needed if the > special syntax "cat::function" is required) > 4. functions in current catalog and db. > > If we go with the modular approach and make external built-in functions as > an add-on module, the 2 and 3 above will be combined. In essence, the > resolution order is equivalent in the two approaches. > > By the way, resolution order matters only for simple name reference. For > names such as db.function (interpreted as current_cat/db/function) or > cat.db.function, the reference is unambiguous, so on resolution is needed. > > As it can be seen, the proposed concept regarding temp function and > function resolution is quite simple. Additionally, the proposed resolution > order allows temp function to shadow a built-in function, which is > important (though not decisive) in our opinion. > > I started liking the modular approach as the resolution order will only > include 1, 2, and 4, which is simpler and more generic. That's why I > suggested we look more into this direction. > > Please let me know if there are further questions. > > Thanks, > Xuefu > > > > > On Thu, Sep 5, 2019 at 2:42 PM Dawid Wysakowicz <dwysakow...@apache.org> > wrote: > >> Hi Xuefu, >> >> Just wanted to summarize my opinion on the one topic (temporary functions). >> >> My preference would be to make temporary functions always 3-part qualified >> (as a result that would prohibit overriding built-in functions). Having >> said that if the community decides that it's better to allow overriding >> built-in functions I am fine with it and can commit to that decision. >> >> I wanted to ask if you could clarify a few points for me around that >> option. >> >> 1. Would you enforce temporary functions to be always just a single >> name (without db & cat) as hive does, or would you allow also 3 or even 2 >> part identifiers? >> 2. Assuming 2/3-part paths. How would you register a function from a >> following statement: CREATE TEMPORARY FUNCTION db.func? Would that shadow >> all functions named 'func' in all databases named 'db' in all catalogs? Or >> would you shadow only function 'func' in database 'db' in current catalog? >> 3. This point is still under discussion, but was mentioned a few >> times, that maybe we want to enable syntax cat.func for "external built-in >> functions". How would that affect statement from previous point? Would >> 'db.func' shadow "external built-in function" in 'db' catalog or user >> functions as in point 2? Or maybe both? >> 4. Lastly in fact to summarize the previous points. Assuming 2/3-part >> paths. Would the function resolution be actually as follows?: >> 1. temporary functions (1-part path) >> 2. built-in functions >> 3. temporary functions (2-part path) >> 4. 2-part catalog functions a.k.a. "external built-in functions" >> (cat + func) - this is still under discussion, if we want that in the >> other >> focal point >> 5. temporary functions (3-part path) >> 6. 3-part catalog functions a.k.a. user functions >> >> I would be really grateful if you could explain me those questions, thanks. >> >> BTW, Thank you all for a healthy discussion. >> >> Best, >> >> Dawid >> On 04/09/2019 23:25, Xuefu Z wrote: >> >> Thank all for the sharing thoughts. I think we have gathered some useful >> initial feedback from this long discussion with a couple of focal points >> sticking out. >> >> We will go back to do more research and adapt our proposal. Once it's >> ready, we will ask for a new round of review. If there is any disagreement, >> we will start a new discussion thread on each rather than having a mega >> discussion like this. >> >> Thanks to everyone for participating. >> >> Regards, >> Xuefu >> >> >> On Thu, Sep 5, 2019 at 2:52 AM Bowen Li <bowenl...@gmail.com> >> <bowenl...@gmail.com> wrote: >> >> >> Let me try to summarize and conclude the long thread so far: >> >> 1. For order of temp function v.s. built-in function: >> >> I think Dawid's point that temp function should be of fully qualified path >> is a better reasoning to back the newly proposed order, and i agree we >> don't need to follow Hive/Spark. >> >> However, I'd rather not change fundamentals of temporary functions in this >> FLIP. It belongs to a bigger story of how temporary objects should be >> redefined and be handled uniformly - currently temporary tables and views >> (those registered from TableEnv#registerTable()) behave different than what >> Dawid propose for temp functions, and we need a FLIP to just unify their >> APIs and behaviors. >> >> I agree that backward compatibility is not an issue w.r.t Jark's points. >> >> ***Seems we do have consensus that it's acceptable to prevent users >> registering a temp function in the same name as a built-in function. To >> help us move forward, I'd like to propose setting such a restraint on temp >> functions in this FLIP to simplify the design and avoid disputes.*** It >> will also leave rooms for improvements in the future. >> >> >> 2. For Hive built-in function: >> >> Thanks Timo for providing the Presto and Postgres examples. I feel modular >> built-in functions can be a good fit for the geo and ml example as a native >> Flink extension, but not sure if it fits well with external integrations. >> Anyway, I think modular built-in functions is a bigger story and can be on >> its own thread too, and our proposal doesn't prevent Flink from doing that >> in the future. >> >> ***Seems we have consensus that users should be able to use built-in >> functions of Hive or other external systems in SQL explicitly and >> deterministically regardless of Flink built-in functions and the potential >> modular built-in functions, via some new syntax like "mycat::func"? If so, >> I'd like to propose removing Hive built-in functions from ambiguous >> function resolution order, and empower users with such a syntax. This way >> we sacrifice a little convenience for certainty*** >> >> >> What do you think? >> >> On Wed, Sep 4, 2019 at 7:02 AM Dawid Wysakowicz <dwysakow...@apache.org> >> <dwysakow...@apache.org> >> wrote: >> >> >> Hi, >> >> Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just >> performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they >> >> are >> >> very inconsistent in that manner (spark being way worse on that). >> >> Hive: >> >> You cannot overwrite all the built-in functions. I could overwrite most >> >> of >> >> the functions I tried e.g. length, e, pi, round, rtrim, but there are >> functions I cannot overwrite e.g. CAST, ARRAY I get: >> >> >> * ParseException line 1:29 cannot recognize input near 'array' 'AS' * >> >> What is interesting is that I cannot ovewrite *array*, but I can ovewrite >> *map* or *struct*. Though hive behaves reasonable well if I manage to >> overwrite a function. When I drop the temporary function the native >> function is still available. >> >> Spark: >> >> Spark's behavior imho is super bad. >> >> Theoretically I could overwrite all functions. I was able e.g. to >> overwrite CAST function. I had to use though CREATE OR REPLACE TEMPORARY >> FUNCTION syntax. Otherwise I get an exception that a function already >> exists. However when I used the CAST function in a query it used the >> native, built-in one. >> >> When I overwrote current_date() function, it was used in a query, but it >> completely replaces the built-in function and I can no longer use the >> native function in any way. I cannot also drop the temporary function. I >> get: >> >> * Error in query: Cannot drop native function 'current_date';* >> >> Additional note, both systems do not allow creating TEMPORARY FUNCTIONS >> with a database. Temporary functions are always represented as a single >> name. >> >> In my opinion neither of the systems have consistent behavior. Generally >> speaking I think overwriting any system provided functions is just >> dangerous. >> >> Regarding Jark's concerns. Such functions would be registered in a >> >> current >> >> catalog/database schema, so a user could still use its own function, but >> would have to fully qualify the function (because built-in functions take >> precedence). Moreover users would have the same problem with permanent >> functions. Imagine a user have a permanent function 'cat.db.explode'. In >> 1.9 the user could use just the 'explode' function as long as the 'cat' & >> 'db' were the default catalog & database. If we introduce 'explode' >> built-in function in 1.10, the user has to fully qualify the function. >> >> Best, >> >> Dawid >> On 04/09/2019 15:19, Timo Walther wrote: >> >> Hi all, >> >> thanks for the healthy discussion. It is already a very long discussion >> with a lot of text. So I will just post my opinion to a couple of >> statements: >> >> >> Hive built-in functions are not part of Flink built-in functions, they >> >> are catalog functions >> >> That is not entirely true. Correct me if I'm wrong but I think Hive >> built-in functions are also not catalog functions. They are not stored in >> every Hive metastore catalog that is freshly created but are a set of >> functions that are listed somewhere and made available. >> >> >> ambiguous functions reference just shouldn't be resolved to a different >> >> catalog >> >> I agree. They should not be resolved to a different catalog. That's why I >> am suggesting to split the concept of built-in functions and catalog >> >> lookup >> >> semantics. >> >> >> I don't know if any other databases handle built-in functions like that >> >> What I called "module" is: >> - Extension in Postgres [1] >> - Plugin in Presto [2] >> >> Btw. Presto even mentions example modules that are similar to the ones >> that we will introduce in the near future both for ML and System XYZ >> compatibility: >> "See either the presto-ml module for machine learning functions or the >> presto-teradata-functions module for Teradata-compatible functions, both >> >> in >> >> the root of the Presto source." >> >> >> functions should be either built-in already or just libraries >> >> functions, >> >> and library functions can be adapted to catalog APIs or of some other >> syntax to use >> >> Regarding "built-in already", of course we can add a lot of functions as >> built-ins but we will end-up in a dependency hell in the near future if >> >> we >> >> don't introduce a pluggable approach. Library functions is what you also >> suggest but storing them in a catalog means to always fully qualify them >> >> or >> >> modifying the existing catalog design that was inspired by the standard. >> >> I don't think "it brings in even more complicated scenarios to the >> design", it just does clear separation of concerns. Integrating the >> functionality into the current design makes the catalog API more >> complicated. >> >> >> why would users name a temporary function the same as a built-in >> >> function then? >> >> Because you never know what users do. If they don't, my suggested >> resolution order should not be a problem, right? >> >> >> I don't think hive functions deserves be a function module >> >> Our goal is not to create a Hive clone. We need to think forward and Hive >> is just one of many systems that we can support. Not every built-in >> function behaves and will behave exactly like Hive. >> >> >> regarding temporary functions, there are few systems that support it >> >> IMHO Spark and Hive are not always the best examples for consistent >> design. Systems like Postgres, Presto, or SQL Server should be used as a >> reference. I don't think that a user can overwrite a built-in function >> there. >> >> Regards, >> Timo >> >> [1] https://www.postgresql.org/docs/10/extend-extensions.html >> [2] https://prestodb.github.io/docs/current/develop/functions.html >> >> >> On 04.09.19 13:44, Jark Wu wrote: >> >> Hi all, >> >> Regarding #1 temp function <> built-in function and naming. >> I'm fine with temp functions should precede built-in function and can >> override built-in functions (we already support to override built-in >> function in 1.9). >> If we don't allow the same name as a built-in function, I'm afraid we >> >> will >> >> have compatibility issues in the future. >> Say users register a user defined function named "explode" in 1.9, and we >> support a built-in "explode" function in 1.10. >> Then the user's jobs which call the registered "explode" function in 1.9 >> will all fail in 1.10 because of naming conflict. >> >> Regarding #2 "External" built-in functions. >> I think if we store external built-in functions in catalog, then >> "hive1::sqrt" is a good way to go. >> However, I would prefer to support a discovery mechanism (e.g. SPI) for >> built-in functions as Timo suggested above. >> This gives us the flexibility to add Hive or MySQL or Geo or whatever >> function set as built-in functions in an easy way. >> >> Best, >> Jark >> >> On Wed, 4 Sep 2019 at 17:47, Xuefu Z <usxu...@gmail.com> >> <usxu...@gmail.com><usxu...@gmail.com> <usxu...@gmail.com> wrote: >> >> Hi David, >> >> Thank you for sharing your findings. It seems to me that there is no SQL >> standard regarding temporary functions. There are few systems that >> >> support >> >> it. Here are what I have found: >> >> 1. Hive: no DB qualifier allowed. Can overwrite built-in. >> 2. Spark: basically follows Hive ( >> >> >> >> >> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html >> >> ) >> 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of overwriting >> behavior. >> (http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html >> >> ) >> >> Because of lack of standard, it's perfectly fine for Flink to define >> whatever it sees appropriate. Thus, your proposal (no overwriting and >> >> must >> >> have DB as holder) is one option. The advantage is simplicity, The >> downside >> is the deviation from Hive, which is popular and de facto standard in big >> data world. >> >> However, I don't think we have to follow Hive. More importantly, we need >> >> a >> >> consensus. I have no objection if your proposal is generally agreed upon. >> >> Thanks, >> Xuefu >> >> On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz <dwysakow...@apache.org >> <dwysakow...@apache.org> <dwysakow...@apache.org> >> wrote: >> >> Hi all, >> >> Just an opinion on the built-in <> temporary functions resolution and >> NAMING issue. I think we should not allow overriding the built-in >> functions, as this may pose serious issues and to be honest is rather >> not feasible and would require major rework. What happens if a user >> wants to override CAST? Calls to that function are generated at >> different layers of the stack that unfortunately does not always go >> through the Catalog API (at least yet). Moreover from what I've checked >> no other systems allow overriding the built-in functions. All the >> systems I've checked so far register temporary functions in a >> database/schema (either special database for temporary functions, or >> just current database). What I would suggest is to always register >> temporary functions with a 3 part identifier. The same way as tables, >> views etc. This effectively means you cannot override built-in >> functions. With such approach it is natural that the temporary functions >> end up a step lower in the resolution order: >> >> 1. built-in functions (1 part, maybe 2? - this is still under discussion) >> >> 2. temporary functions (always 3 part path) >> >> 3. catalog functions (always 3 part path) >> >> Let me know what do you think. >> >> Best, >> >> Dawid >> >> On 04/09/2019 06:13, Bowen Li wrote: >> >> Hi, >> >> I agree with Xuefu that the main controversial points are mainly the >> >> two >> >> places. My thoughts on them: >> >> 1) Determinism of referencing Hive built-in functions. We can either >> >> remove >> >> Hive built-in functions from ambiguous function resolution and require >> users to use special syntax for their qualified names, or add a config >> >> flag >> >> to catalog constructor/yaml for turning on and off Hive built-in >> >> functions >> >> with the flag set to 'false' by default and proper doc added to help >> >> users >> >> make their decisions. >> >> 2) Flink temp functions v.s. Flink built-in functions in ambiguous >> >> function >> >> resolution order. We believe Flink temp functions should precede Flink >> built-in functions, and I have presented my reasons. Just in case if we >> cannot reach an agreement, I propose forbid users registering temp >> functions in the same name as a built-in function, like MySQL's >> >> approach, >> >> for the moment. It won't have any performance concern, since built-in >> functions are all in memory and thus cost of a name check will be >> >> really >> >> trivial. >> >> >> On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <usxu...@gmail.com> >> <usxu...@gmail.com><usxu...@gmail.com> <usxu...@gmail.com> wrote: >> >> From what I have seen, there are a couple of focal disagreements: >> >> 1. Resolution order: temp function --> flink built-in function --> >> >> catalog >> >> function vs flink built-in function --> temp function -> catalog >> >> function. >> >> 2. "External" built-in functions: how to treat built-in functions in >> external system and how users reference them >> >> For #1, I agree with Bowen that temp function needs to be at the >> >> highest >> >> priority because that's how a user might overwrite a built-in function >> without referencing a persistent, overwriting catalog function with a >> >> fully >> >> qualified name. Putting built-in functions at the highest priority >> eliminates that usage. >> >> For #2, I saw a general agreement on referencing "external" built-in >> functions such as those in Hive needs to be explicit and deterministic >> >> even >> >> though different approaches are proposed. To limit the scope and >> >> simply >> >> the >> >> usage, it seems making sense to me to introduce special syntax for >> >> user to >> >> explicitly reference an external built-in function such as hive1::sqrt >> >> or >> >> hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API >> >> call >> >> hive1.getFunction(ObjectPath functionName) where the database name is >> absent for bulit-in functions available in that catalog hive1. I >> >> understand >> >> that Bowen's original proposal was trying to avoid this, but this >> >> could >> >> turn out to be a clean and simple solution. >> >> (Timo's modular approach is great way to "expand" Flink's built-in >> >> function >> >> set, which seems orthogonal and complementary to this, which could be >> tackled in further future work.) >> >> I'd be happy to hear further thoughts on the two points. >> >> Thanks, >> Xuefu >> >> On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <ykt...@gmail.com> >> <ykt...@gmail.com><ykt...@gmail.com> <ykt...@gmail.com> wrote: >> >> Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is >> >> the >> >> same >> as Bowen's. But after thinking about it, I'm currently lean to Timo's >> suggestion. >> >> The reason is backward compatibility. If we follow Bowen's approach, >> >> let's >> >> say we >> first find function in Flink's built-in functions, and then hive's >> built-in. For example, `foo` >> is not supported by Flink, but hive has such built-in function. So >> >> user >> >> will have hive's >> behavior for function `foo`. And in next release, Flink realize this >> >> is a >> >> very popular function >> and add it into Flink's built-in functions, but with different >> >> behavior >> >> as >> >> hive's. So in next >> release, the behavior changes. >> >> With Timo's approach, IIUC user have to tell the framework explicitly >> >> what >> >> kind of >> built-in functions he would like to use. He can just tell framework >> >> to >> >> abandon Flink's built-in >> functions, and use hive's instead. User can only choose between them, >> >> but >> >> not use >> them at the same time. I think this approach is more predictable. >> >> Best, >> Kurt >> >> >> On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bowenl...@gmail.com> >> <bowenl...@gmail.com><bowenl...@gmail.com> <bowenl...@gmail.com> wrote: >> >> Hi all, >> >> Thanks for the feedback. Just a kindly reminder that the [Proposal] >> >> section >> >> in the google doc was updated, please take a look first and let me >> >> know >> >> if >> >> you have more questions. >> >> On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bowenl...@gmail.com> >> <bowenl...@gmail.com><bowenl...@gmail.com> <bowenl...@gmail.com> >> >> wrote: >> >> Hi Timo, >> >> Re> 1) We should not have the restriction "hive built-in functions >> >> can >> >> only >> >> be used when current catalog is hive catalog". Switching a catalog >> should only have implications on the cat.db.object resolution but >> >> not >> >> functions. It would be quite convinient for users to use Hive >> >> built-ins >> >> even if they use a Confluent schema registry or just the in-memory >> >> catalog. >> >> There might be a misunderstanding here. >> >> First of all, Hive built-in functions are not part of Flink >> >> built-in >> >> functions, they are catalog functions, thus if the current catalog >> >> is >> >> not a >> >> HiveCatalog but, say, a schema registry catalog, ambiguous >> >> functions >> >> reference just shouldn't be resolved to a different catalog. >> >> Second, Hive built-in functions can potentially be referenced >> >> across >> >> catalog, but it doesn't have db namespace and we currently just >> >> don't >> >> have >> >> a SQL syntax for it. It can be enabled when such a SQL syntax is >> >> defined, >> >> e.g. "catalog::function", but it's out of scope of this FLIP. >> >> 2) I would propose to have separate concepts for catalog and >> >> built-in >> >> functions. In particular it would be nice to modularize built-in >> functions. Some built-in functions are very crucial (like AS, CAST, >> MINUS), others are more optional but stable (MD5, CONCAT_WS), and >> >> maybe >> >> we add more experimental functions in the future or function for >> >> some >> >> special application area (Geo functions, ML functions). A data >> >> platform >> >> team might not want to make every built-in function available. Or a >> function module like ML functions is in a different Maven module. >> >> I think this is orthogonal to this FLIP, especially we don't have >> >> the >> >> "external built-in functions" anymore and currently the built-in >> >> function >> >> category remains untouched. >> >> But just to share some thoughts on the proposal, I'm not sure about >> >> it: >> >> - I don't know if any other databases handle built-in functions >> >> like >> >> that. >> >> Maybe you can give some examples? IMHO, built-in functions are >> >> system >> >> info >> >> and should be deterministic, not depending on loaded libraries. Geo >> functions should be either built-in already or just libraries >> >> functions, >> >> and library functions can be adapted to catalog APIs or of some >> >> other >> >> syntax to use >> - I don't know if all use cases stand, and many can be achieved by >> >> other >> >> approaches too. E.g. experimental functions can be taken good care >> >> of >> >> by >> >> documentations, annotations, etc >> - the proposal basically introduces some concept like a pluggable >> >> built-in >> >> function catalog, despite the already existing catalog APIs >> - it brings in even more complicated scenarios to the design. E.g. >> >> how >> >> do >> >> you handle built-in functions in different modules but different >> >> names? >> >> In short, I'm not sure if it really stands and it looks like an >> >> overkill >> >> to me. I'd rather not go to that route. Related discussion can be >> >> on >> >> its >> >> own thread. >> >> 3) Following the suggestion above, we can have a separate discovery >> mechanism for built-in functions. Instead of just going through a >> >> static >> >> list like in BuiltInFunctionDefinitions, a platform team should be >> >> able >> >> to select function modules like >> catalogManager.setFunctionModules(CoreFunctions, GeoFunctions, >> HiveFunctions) or via service discovery; >> >> Same as above. I'll leave it to its own thread. >> >> re > 3) Dawid and I discussed the resulution order again. I agree >> >> with >> >> Kurt >> >> that we should unify built-in function (external or internal) >> >> under a >> >> common layer. However, the resolution order should be: >> 1. built-in functions >> 2. temporary functions >> 3. regular catalog resolution logic >> Otherwise a temporary function could cause clashes with Flink's >> >> built-in >> >> functions. If you take a look at other vendors, like SQL Server >> >> they >> >> also do not allow to overwrite built-in functions. >> >> ”I agree with Kurt that we should unify built-in function (external >> >> or >> >> internal) under a common layer.“ <- I don't think this is what Kurt >> >> means. >> >> Kurt and I are in favor of unifying built-in functions of external >> >> systems >> >> and catalog functions. Did you type a mistake? >> >> Besides, I'm not sure about the resolution order you proposed. >> >> Temporary >> >> functions have a lifespan over a session and are only visible to >> >> the >> >> session owner, they are unique to each user, and users create them >> >> on >> >> purpose to be the highest priority in order to overwrite system >> >> info >> >> (built-in functions in this case). >> >> In your case, why would users name a temporary function the same >> >> as a >> >> built-in function then? Since using that name in ambiguous function >> reference will always be resolved to built-in functions, creating a >> same-named temp function would be meaningless in the end. >> >> >> On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bowenl...@gmail.com> >> <bowenl...@gmail.com><bowenl...@gmail.com> <bowenl...@gmail.com> >> >> wrote: >> >> Hi Jingsong, >> >> Re> 1.Hive built-in functions is an intermediate solution. So we >> >> should >> >> not introduce interfaces to influence the framework. To make >> Flink itself more powerful, we should implement the functions >> we need to add. >> >> Yes, please see the doc. >> >> Re> 2.Non-flink built-in functions are easy for users to change >> >> their >> >> behavior. If we support some flink built-in functions in the >> future but act differently from non-flink built-in, this will >> >> lead >> >> to >> >> changes in user behavior. >> >> There's no such concept as "external built-in functions" any more. >> Built-in functions of external systems will be treated as special >> >> catalog >> >> functions. >> >> Re> Another question is, does this fallback include all >> >> hive built-in functions? As far as I know, some hive functions >> have some hacky. If possible, can we start with a white list? >> Once we implement some functions to flink built-in, we can >> also update the whitelist. >> >> Yes, that's something we thought of too. I don't think it's super >> critical to the scope of this FLIP, thus I'd like to leave it to >> >> future >> >> efforts as a nice-to-have feature. >> >> >> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bowenl...@gmail.com> >> <bowenl...@gmail.com><bowenl...@gmail.com> <bowenl...@gmail.com> >> >> wrote: >> >> Hi Kurt, >> >> Re: > What I want to propose is we can merge #3 and #4, make them >> >> both >> >> under >> >> "catalog" concept, by extending catalog function to make it have >> >> ability to >> >> have built-in catalog functions. Some benefits I can see from >> >> this >> >> approach: >> >> 1. We don't have to introduce new concept like external built-in >> >> functions. >> >> Actually I don't see a full story about how to treat a built-in >> >> functions, and it >> >> seems a little bit disrupt with catalog. As a result, you have >> >> to >> >> make >> >> some restriction >> >> like "hive built-in functions can only be used when current >> >> catalog >> >> is >> >> hive catalog". >> >> Yes, I've unified #3 and #4 but it seems I didn't update some >> >> part >> >> of >> >> the doc. I've modified those sections, and they are up to date >> >> now. >> >> In short, now built-in function of external systems are defined >> >> as >> >> a >> >> special kind of catalog function in Flink, and handled by Flink >> >> as >> >> following: >> - An external built-in function must be associated with a catalog >> >> for >> >> the purpose of decoupling flink-table and external systems. >> - It always resides in front of catalog functions in ambiguous >> >> function >> >> reference order, just like in its own external system >> - It is a special catalog function that doesn’t have a >> >> schema/database >> >> namespace >> - It goes thru the same instantiation logic as other user defined >> catalog functions in the external system >> >> Please take another look at the doc, and let me know if you have >> >> more >> >> questions. >> >> >> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <twal...@apache.org> >> <twal...@apache.org><twal...@apache.org> <twal...@apache.org> >> >> wrote: >> >> Hi Kurt, >> >> it should not affect the functions and operations we currently >> >> have >> >> in >> >> SQL. It just categorizes the available built-in functions. It is >> >> kind >> >> of >> an orthogonal concept to the catalog API but built-in functions >> >> deserve >> >> this special kind of treatment. CatalogFunction still fits >> >> perfectly >> >> in >> >> there because the regular catalog object resolution logic is not >> affected. So tables and functions are resolved in the same way >> >> but >> >> with >> >> built-in functions that have priority as in the original design. >> >> Regards, >> Timo >> >> >> On 03.09.19 15:26, Kurt Young wrote: >> >> Does this only affect the functions and operations we currently >> >> have >> >> in SQL >> >> and >> have no effect on tables, right? Looks like this is an >> >> orthogonal >> >> concept >> >> with Catalog? >> If the answer are both yes, then the catalog function will be a >> >> weird >> >> concept? >> >> Best, >> Kurt >> >> >> On Tue, Sep 3, 2019 at 8:10 PM Danny Chan < >> yuzhao....@gmail.com >> >> wrote: >> >> The way you proposed are basically the same as what Calcite >> >> does, I >> >> think >> >> we are in the same line. >> >> Best, >> Danny Chan >> 在 2019年9月3日 +0800 PM7:57,Timo Walther <twal...@apache.org >> >> ,写道: >> >> This sounds exactly as the module approach I mentioned, no? >> >> Regards, >> Timo >> >> On 03.09.19 13:42, Danny Chan wrote: >> >> Thanks Bowen for bring up this topic, I think it’s a useful >> >> refactoring to make our function usage more user friendly. >> >> For the topic of how to organize the builtin operators and >> >> operators >> >> of Hive, here is a solution from Apache Calcite, the Calcite >> >> way >> >> is >> >> to make >> >> every dialect operators a “Library”, user can specify which >> >> libraries they >> >> want to use for a sql query. The builtin operators always >> >> comes >> >> as >> >> the >> >> first class objects and the others are used from the order >> >> they >> >> appears. >> >> Maybe you can take a reference. >> >> [1] >> >> >> >> >> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28 >> >> Best, >> Danny Chan >> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bowenl...@gmail.com >> >> ,写道: >> >> Hi folks, >> >> I'd like to kick off a discussion on reworking Flink's >> >> FunctionCatalog. >> >> It's critically helpful to improve function usability in >> >> SQL. >> >> >> >> >> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing >> >> In short, it: >> - adds support for precise function reference with >> >> fully/partially >> >> qualified name >> - redefines function resolution order for ambiguous >> >> function >> >> reference >> >> - adds support for Hive's rich built-in functions (support >> >> for >> >> Hive >> >> user >> >> defined functions was already added in 1.9.0) >> - clarifies the concept of temporary functions >> >> Would love to hear your thoughts. >> >> Bowen >> >> -- >> Xuefu Zhang >> >> "In Honey We Trust!" >> >> >> -- >> Xuefu Zhang >> >> "In Honey We Trust!" >> >> >> >> >>
signature.asc
Description: OpenPGP digital signature