Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Dawid Wysakowicz Wed, 04 Sep 2019 07:02:34 -0700

Hi,

Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they
are very inconsistent in that manner (spark being way worse on that).


Hive:

You cannot overwrite all the built-in functions. I could overwrite most
of the functions I tried e.g. length, e, pi, round, rtrim, but there are
functions I cannot overwrite e.g. CAST, ARRAY I get:

/    ParseException line 1:29 cannot recognize input near 'array' 'AS'
/

What is interesting is that I cannot ovewrite /array/, but I can
ovewrite /map/ or /struct/. Though hive behaves reasonable well if I
manage to overwrite a function. When I drop the temporary function the
native function is still available.

Spark:

Spark's behavior imho is super bad.

Theoretically I could overwrite all functions. I was able e.g. to
overwrite CAST function. I had to use though CREATE OR REPLACE TEMPORARY
FUNCTION syntax. Otherwise I get an exception that a function already
exists. However when I used the CAST function in a query it used the
native, built-in one.

When I overwrote current_date() function, it was used in a query, but it
completely replaces the built-in function and I can no longer use the
native function in any way. I cannot also drop the temporary function. I
get:

/    Error in query: Cannot drop native function 'current_date';/

Additional note, both systems do not allow creating TEMPORARY FUNCTIONS
with a database. Temporary functions are always represented as a single
name.

In my opinion neither of the systems have consistent behavior. Generally
speaking I think overwriting any system provided functions is just
dangerous.

Regarding Jark's concerns. Such functions would be registered in a
current catalog/database schema, so a user could still use its own
function, but would have to fully qualify the function (because built-in
functions take precedence). Moreover users would have the same problem
with permanent functions. Imagine a user have a permanent function
'cat.db.explode'. In 1.9 the user could use just the 'explode' function
as long as the 'cat' & 'db' were the default catalog & database. If we
introduce 'explode' built-in function in 1.10, the user has to fully
qualify the function.

Best,

Dawid

On 04/09/2019 15:19, Timo Walther wrote:
> Hi all,
>
> thanks for the healthy discussion. It is already a very long
> discussion with a lot of text. So I will just post my opinion to a
> couple of statements:
>
> > Hive built-in functions are not part of Flink built-in functions,
> they are catalog functions
>
> That is not entirely true. Correct me if I'm wrong but I think Hive
> built-in functions are also not catalog functions. They are not stored
> in every Hive metastore catalog that is freshly created but are a set
> of functions that are listed somewhere and made available.
>
> > ambiguous functions reference just shouldn't be resolved to a
> different catalog
>
> I agree. They should not be resolved to a different catalog. That's
> why I am suggesting to split the concept of built-in functions and
> catalog lookup semantics.
>
> > I don't know if any other databases handle built-in functions like that
>
> What I called "module" is:
> - Extension in Postgres [1]
> - Plugin in Presto [2]
>
> Btw. Presto even mentions example modules that are similar to the ones
> that we will introduce in the near future both for ML and System XYZ
> compatibility:
> "See either the presto-ml module for machine learning functions or the
> presto-teradata-functions module for Teradata-compatible functions,
> both in the root of the Presto source."
>
> > functions should be either built-in already or just libraries
> functions, and library functions can be adapted to catalog APIs or of
> some other syntax to use
>
> Regarding "built-in already", of course we can add a lot of functions
> as built-ins but we will end-up in a dependency hell in the near
> future if we don't introduce a pluggable approach. Library functions
> is what you also suggest but storing them in a catalog means to always
> fully qualify them or modifying the existing catalog design that was
> inspired by the standard.
>
> I don't think "it brings in even more complicated scenarios to the
> design", it just does clear separation of concerns. Integrating the
> functionality into the current design makes the catalog API more
> complicated.
>
> > why would users name a temporary function the same as a built-in
> function then?
>
> Because you never know what users do. If they don't, my suggested
> resolution order should not be a problem, right?
>
> > I don't think hive functions deserves be a function module
>
> Our goal is not to create a Hive clone. We need to think forward and
> Hive is just one of many systems that we can support. Not every
> built-in function behaves and will behave exactly like Hive.
>
> > regarding temporary functions, there are few systems that support it
>
> IMHO Spark and Hive are not always the best examples for consistent
> design. Systems like Postgres, Presto, or SQL Server should be used as
> a reference. I don't think that a user can overwrite a built-in
> function there.
>
> Regards,
> Timo
>
> [1] https://www.postgresql.org/docs/10/extend-extensions.html
> [2] https://prestodb.github.io/docs/current/develop/functions.html
>
>
> On 04.09.19 13:44, Jark Wu wrote:
>> Hi all,
>>
>> Regarding #1 temp function <> built-in function and naming.
>> I'm fine with temp functions should precede built-in function and can
>> override built-in functions (we already support to override built-in
>> function in 1.9).
>> If we don't allow the same name as a built-in function, I'm afraid we
>> will
>> have compatibility issues in the future.
>> Say users register a user defined function named "explode" in 1.9,
>> and we
>> support a built-in "explode" function in 1.10.
>> Then the user's jobs which call the registered "explode" function in 1.9
>> will all fail in 1.10 because of naming conflict.
>>
>> Regarding #2 "External" built-in functions.
>> I think if we store external built-in functions in catalog, then
>> "hive1::sqrt" is a good way to go.
>> However, I would prefer to support a discovery mechanism (e.g. SPI) for
>> built-in functions as Timo suggested above.
>> This gives us the flexibility to add Hive or MySQL or Geo or whatever
>> function set as built-in functions in an easy way.
>>
>> Best,
>> Jark
>>
>> On Wed, 4 Sep 2019 at 17:47, Xuefu Z <usxu...@gmail.com> wrote:
>>
>>> Hi David,
>>>
>>> Thank you for sharing your findings. It seems to me that there is no
>>> SQL
>>> standard regarding temporary functions. There are few systems that
>>> support
>>> it. Here are what I have found:
>>>
>>> 1. Hive: no DB qualifier allowed. Can overwrite built-in.
>>> 2. Spark: basically follows Hive (
>>>
>>> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
>>>
>>> )
>>> 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of
>>> overwriting
>>> behavior. (
>>> http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html)
>>>
>>>
>>> Because of lack of standard, it's perfectly fine for Flink to define
>>> whatever it sees appropriate. Thus, your proposal (no overwriting
>>> and must
>>> have DB as holder) is one option. The advantage is simplicity, The
>>> downside
>>> is the deviation from Hive, which is popular and de facto standard
>>> in big
>>> data world.
>>>
>>> However, I don't think we have to follow Hive. More importantly, we
>>> need a
>>> consensus. I have no objection if your proposal is generally agreed
>>> upon.
>>>
>>> Thanks,
>>> Xuefu
>>>
>>> On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz
>>> <dwysakow...@apache.org>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Just an opinion on the built-in <> temporary functions resolution and
>>>> NAMING issue. I think we should not allow overriding the built-in
>>>> functions, as this may pose serious issues and to be honest is rather
>>>> not feasible and would require major rework. What happens if a user
>>>> wants to override CAST? Calls to that function are generated at
>>>> different layers of the stack that unfortunately does not always go
>>>> through the Catalog API (at least yet). Moreover from what I've
>>>> checked
>>>> no other systems allow overriding the built-in functions. All the
>>>> systems I've checked so far register temporary functions in a
>>>> database/schema (either special database for temporary functions, or
>>>> just current database). What I would suggest is to always register
>>>> temporary functions with a 3 part identifier. The same way as tables,
>>>> views etc. This effectively means you cannot override built-in
>>>> functions. With such approach it is natural that the temporary
>>>> functions
>>>> end up a step lower in the resolution order:
>>>>
>>>> 1. built-in functions (1 part, maybe 2? - this is still under
>>>> discussion)
>>>>
>>>> 2. temporary functions (always 3 part path)
>>>>
>>>> 3. catalog functions (always 3 part path)
>>>>
>>>> Let me know what do you think.
>>>>
>>>> Best,
>>>>
>>>> Dawid
>>>>
>>>> On 04/09/2019 06:13, Bowen Li wrote:
>>>>> Hi,
>>>>>
>>>>> I agree with Xuefu that the main controversial points are mainly the
>>> two
>>>>> places. My thoughts on them:
>>>>>
>>>>> 1) Determinism of referencing Hive built-in functions. We can either
>>>> remove
>>>>> Hive built-in functions from ambiguous function resolution and
>>>>> require
>>>>> users to use special syntax for their qualified names, or add a
>>>>> config
>>>> flag
>>>>> to catalog constructor/yaml for turning on and off Hive built-in
>>>> functions
>>>>> with the flag set to 'false' by default and proper doc added to help
>>>> users
>>>>> make their decisions.
>>>>>
>>>>> 2) Flink temp functions v.s. Flink built-in functions in ambiguous
>>>> function
>>>>> resolution order. We believe Flink temp functions should precede
>>>>> Flink
>>>>> built-in functions, and I have presented my reasons. Just in case
>>>>> if we
>>>>> cannot reach an agreement, I propose forbid users registering temp
>>>>> functions in the same name as a built-in function, like MySQL's
>>> approach,
>>>>> for the moment. It won't have any performance concern, since built-in
>>>>> functions are all in memory and thus cost of a name check will be
>>> really
>>>>> trivial.
>>>>>
>>>>>
>>>>> On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <usxu...@gmail.com> wrote:
>>>>>
>>>>>>  From what I have seen, there are a couple of focal disagreements:
>>>>>>
>>>>>> 1. Resolution order: temp function --> flink built-in function -->
>>>> catalog
>>>>>> function vs flink built-in function --> temp function -> catalog
>>>> function.
>>>>>> 2. "External" built-in functions: how to treat built-in functions in
>>>>>> external system and how users reference them
>>>>>>
>>>>>> For #1, I agree with Bowen that temp function needs to be at the
>>> highest
>>>>>> priority because that's how a user might overwrite a built-in
>>>>>> function
>>>>>> without referencing a persistent, overwriting catalog function
>>>>>> with a
>>>> fully
>>>>>> qualified name. Putting built-in functions at the highest priority
>>>>>> eliminates that usage.
>>>>>>
>>>>>> For #2, I saw a general agreement on referencing "external" built-in
>>>>>> functions such as those in Hive needs to be explicit and
>>>>>> deterministic
>>>> even
>>>>>> though different approaches are proposed. To limit the scope and
>>> simply
>>>> the
>>>>>> usage, it seems making sense to me to introduce special syntax for
>>>> user  to
>>>>>> explicitly reference an external built-in function such as
>>>>>> hive1::sqrt
>>>> or
>>>>>> hive1._built_in.sqrt. This is a DML syntax matching nicely
>>>>>> Catalog API
>>>> call
>>>>>> hive1.getFunction(ObjectPath functionName) where the database
>>>>>> name is
>>>>>> absent for bulit-in functions available in that catalog hive1. I
>>>> understand
>>>>>> that Bowen's original proposal was trying to avoid this, but this
>>> could
>>>>>> turn out to be a clean and simple solution.
>>>>>>
>>>>>> (Timo's modular approach is great way to "expand" Flink's built-in
>>>> function
>>>>>> set, which seems orthogonal and complementary to this, which
>>>>>> could be
>>>>>> tackled in further future work.)
>>>>>>
>>>>>> I'd be happy to hear further thoughts on the two points.
>>>>>>
>>>>>> Thanks,
>>>>>> Xuefu
>>>>>>
>>>>>> On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <ykt...@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Timo & Bowen for the feedback. Bowen was right, my
>>>>>>> proposal is
>>>> the
>>>>>>> same
>>>>>>> as Bowen's. But after thinking about it, I'm currently lean to
>>>>>>> Timo's
>>>>>>> suggestion.
>>>>>>>
>>>>>>> The reason is backward compatibility. If we follow Bowen's
>>>>>>> approach,
>>>>>> let's
>>>>>>> say we
>>>>>>> first find function in Flink's built-in functions, and then hive's
>>>>>>> built-in. For example, `foo`
>>>>>>> is not supported by Flink, but hive has such built-in function. So
>>> user
>>>>>>> will have hive's
>>>>>>> behavior for function `foo`. And in next release, Flink realize
>>>>>>> this
>>>> is a
>>>>>>> very popular function
>>>>>>> and add it into Flink's built-in functions, but with different
>>> behavior
>>>>>> as
>>>>>>> hive's. So in next
>>>>>>> release, the behavior changes.
>>>>>>>
>>>>>>> With Timo's approach, IIUC user have to tell the framework
>>>>>>> explicitly
>>>>>> what
>>>>>>> kind of
>>>>>>> built-in functions he would like to use. He can just tell framework
>>> to
>>>>>>> abandon Flink's built-in
>>>>>>> functions, and use hive's instead. User can only choose between
>>>>>>> them,
>>>> but
>>>>>>> not use
>>>>>>> them at the same time. I think this approach is more predictable.
>>>>>>>
>>>>>>> Best,
>>>>>>> Kurt
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bowenl...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> Thanks for the feedback. Just a kindly reminder that the
>>>>>>>> [Proposal]
>>>>>>> section
>>>>>>>> in the google doc was updated, please take a look first and let me
>>>> know
>>>>>>> if
>>>>>>>> you have more questions.
>>>>>>>>
>>>>>>>> On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bowenl...@gmail.com>
>>> wrote:
>>>>>>>>> Hi Timo,
>>>>>>>>>
>>>>>>>>> Re> 1) We should not have the restriction "hive built-in
>>>>>>>>> functions
>>>>>> can
>>>>>>>>> only
>>>>>>>>>> be used when current catalog is hive catalog". Switching a
>>>>>>>>>> catalog
>>>>>>>>>> should only have implications on the cat.db.object resolution
>>>>>>>>>> but
>>>>>> not
>>>>>>>>>> functions. It would be quite convinient for users to use Hive
>>>>>>> built-ins
>>>>>>>>>> even if they use a Confluent schema registry or just the
>>>>>>>>>> in-memory
>>>>>>>>> catalog.
>>>>>>>>>
>>>>>>>>> There might be a misunderstanding here.
>>>>>>>>>
>>>>>>>>> First of all, Hive built-in functions are not part of Flink
>>> built-in
>>>>>>>>> functions, they are catalog functions, thus if the current
>>>>>>>>> catalog
>>> is
>>>>>>>> not a
>>>>>>>>> HiveCatalog but, say, a schema registry catalog, ambiguous
>>> functions
>>>>>>>>> reference just shouldn't be resolved to a different catalog.
>>>>>>>>>
>>>>>>>>> Second, Hive built-in functions can potentially be referenced
>>> across
>>>>>>>>> catalog, but it doesn't have db namespace and we currently just
>>> don't
>>>>>>>> have
>>>>>>>>> a SQL syntax for it. It can be enabled when such a SQL syntax is
>>>>>>> defined,
>>>>>>>>> e.g. "catalog::function", but it's out of scope of this FLIP.
>>>>>>>>>
>>>>>>>>> 2) I would propose to have separate concepts for catalog and
>>> built-in
>>>>>>>>> functions. In particular it would be nice to modularize built-in
>>>>>>>>> functions. Some built-in functions are very crucial (like AS,
>>>>>>>>> CAST,
>>>>>>>>> MINUS), others are more optional but stable (MD5, CONCAT_WS), and
>>>>>> maybe
>>>>>>>>> we add more experimental functions in the future or function for
>>> some
>>>>>>>>> special application area (Geo functions, ML functions). A data
>>>>>> platform
>>>>>>>>> team might not want to make every built-in function available.
>>>>>>>>> Or a
>>>>>>>>> function module like ML functions is in a different Maven module.
>>>>>>>>>
>>>>>>>>> I think this is orthogonal to this FLIP, especially we don't have
>>> the
>>>>>>>>> "external built-in functions" anymore and currently the built-in
>>>>>>> function
>>>>>>>>> category remains untouched.
>>>>>>>>>
>>>>>>>>> But just to share some thoughts on the proposal, I'm not sure
>>>>>>>>> about
>>>>>> it:
>>>>>>>>> - I don't know if any other databases handle built-in functions
>>> like
>>>>>>>> that.
>>>>>>>>> Maybe you can give some examples? IMHO, built-in functions are
>>> system
>>>>>>>> info
>>>>>>>>> and should be deterministic, not depending on loaded
>>>>>>>>> libraries. Geo
>>>>>>>>> functions should be either built-in already or just libraries
>>>>>>> functions,
>>>>>>>>> and library functions can be adapted to catalog APIs or of some
>>> other
>>>>>>>>> syntax to use
>>>>>>>>> - I don't know if all use cases stand, and many can be
>>>>>>>>> achieved by
>>>>>>> other
>>>>>>>>> approaches too. E.g. experimental functions can be taken good
>>>>>>>>> care
>>> of
>>>>>>> by
>>>>>>>>> documentations, annotations, etc
>>>>>>>>> - the proposal basically introduces some concept like a pluggable
>>>>>>>> built-in
>>>>>>>>> function catalog, despite the already existing catalog APIs
>>>>>>>>> - it brings in even more complicated scenarios to the design.
>>>>>>>>> E.g.
>>>>>> how
>>>>>>> do
>>>>>>>>> you handle built-in functions in different modules but different
>>>>>> names?
>>>>>>>>> In short, I'm not sure if it really stands and it looks like an
>>>>>>> overkill
>>>>>>>>> to me. I'd rather not go to that route. Related discussion can be
>>> on
>>>>>>> its
>>>>>>>>> own thread.
>>>>>>>>>
>>>>>>>>> 3) Following the suggestion above, we can have a separate
>>>>>>>>> discovery
>>>>>>>>> mechanism for built-in functions. Instead of just going through a
>>>>>>> static
>>>>>>>>> list like in BuiltInFunctionDefinitions, a platform team
>>>>>>>>> should be
>>>>>> able
>>>>>>>>> to select function modules like
>>>>>>>>> catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
>>>>>>>>> HiveFunctions) or via service discovery;
>>>>>>>>>
>>>>>>>>> Same as above. I'll leave it to its own thread.
>>>>>>>>>
>>>>>>>>> re > 3) Dawid and I discussed the resulution order again. I agree
>>>>>> with
>>>>>>>>> Kurt
>>>>>>>>>> that we should unify built-in function (external or internal)
>>>>>> under a
>>>>>>>>>> common layer. However, the resolution order should be:
>>>>>>>>>>    1. built-in functions
>>>>>>>>>>    2. temporary functions
>>>>>>>>>>    3. regular catalog resolution logic
>>>>>>>>>> Otherwise a temporary function could cause clashes with Flink's
>>>>>>>> built-in
>>>>>>>>>> functions. If you take a look at other vendors, like SQL Server
>>>>>> they
>>>>>>>>>> also do not allow to overwrite built-in functions.
>>>>>>>>> ”I agree with Kurt that we should unify built-in function
>>>>>>>>> (external
>>>>>> or
>>>>>>>>> internal) under a common layer.“ <- I don't think this is what
>>>>>>>>> Kurt
>>>>>>>> means.
>>>>>>>>> Kurt and I are in favor of unifying built-in functions of
>>>>>>>>> external
>>>>>>>> systems
>>>>>>>>> and catalog functions. Did you type a mistake?
>>>>>>>>>
>>>>>>>>> Besides, I'm not sure about the resolution order you proposed.
>>>>>>> Temporary
>>>>>>>>> functions have a lifespan over a session and are only visible to
>>> the
>>>>>>>>> session owner, they are unique to each user, and users create
>>>>>>>>> them
>>> on
>>>>>>>>> purpose to be the highest priority in order to overwrite system
>>> info
>>>>>>>>> (built-in functions in this case).
>>>>>>>>>
>>>>>>>>> In your case, why would users name a temporary function the same
>>> as a
>>>>>>>>> built-in function then? Since using that name in ambiguous
>>>>>>>>> function
>>>>>>>>> reference will always be resolved to built-in functions,
>>>>>>>>> creating a
>>>>>>>>> same-named temp function would be meaningless in the end.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bowenl...@gmail.com>
>>> wrote:
>>>>>>>>>> Hi Jingsong,
>>>>>>>>>>
>>>>>>>>>> Re> 1.Hive built-in functions is an intermediate solution. So we
>>>>>>> should
>>>>>>>>>>> not introduce interfaces to influence the framework. To make
>>>>>>>>>>> Flink itself more powerful, we should implement the functions
>>>>>>>>>>> we need to add.
>>>>>>>>>> Yes, please see the doc.
>>>>>>>>>>
>>>>>>>>>> Re> 2.Non-flink built-in functions are easy for users to change
>>>>>> their
>>>>>>>>>>> behavior. If we support some flink built-in functions in the
>>>>>>>>>>> future but act differently from non-flink built-in, this will
>>> lead
>>>>>>> to
>>>>>>>>>>> changes in user behavior.
>>>>>>>>>> There's no such concept as "external built-in functions" any
>>>>>>>>>> more.
>>>>>>>>>> Built-in functions of external systems will be treated as
>>>>>>>>>> special
>>>>>>>> catalog
>>>>>>>>>> functions.
>>>>>>>>>>
>>>>>>>>>> Re> Another question is, does this fallback include all
>>>>>>>>>>> hive built-in functions? As far as I know, some hive functions
>>>>>>>>>>> have some hacky. If possible, can we start with a white list?
>>>>>>>>>>> Once we implement some functions to flink built-in, we can
>>>>>>>>>>> also update the whitelist.
>>>>>>>>>> Yes, that's something we thought of too. I don't think it's
>>>>>>>>>> super
>>>>>>>>>> critical to the scope of this FLIP, thus I'd like to leave it to
>>>>>>> future
>>>>>>>>>> efforts as a nice-to-have feature.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bowenl...@gmail.com>
>>>>>> wrote:
>>>>>>>>>>> Hi Kurt,
>>>>>>>>>>>
>>>>>>>>>>> Re: > What I want to propose is we can merge #3 and #4, make
>>>>>>>>>>> them
>>>>>>> both
>>>>>>>>>>> under
>>>>>>>>>>>> "catalog" concept, by extending catalog function to make it
>>>>>>>>>>>> have
>>>>>>>>>>> ability to
>>>>>>>>>>>> have built-in catalog functions. Some benefits I can see from
>>> this
>>>>>>>>>>> approach:
>>>>>>>>>>>> 1. We don't have to introduce new concept like external
>>>>>>>>>>>> built-in
>>>>>>>>>>> functions.
>>>>>>>>>>>> Actually I don't see a full story about how to treat a
>>>>>>>>>>>> built-in
>>>>>>>>>>> functions, and it
>>>>>>>>>>>> seems a little bit disrupt with catalog. As a result, you have
>>> to
>>>>>>> make
>>>>>>>>>>> some restriction
>>>>>>>>>>>> like "hive built-in functions can only be used when current
>>>>>> catalog
>>>>>>> is
>>>>>>>>>>> hive catalog".
>>>>>>>>>>>
>>>>>>>>>>> Yes, I've unified #3 and #4 but it seems I didn't update some
>>> part
>>>>>> of
>>>>>>>>>>> the doc. I've modified those sections, and they are up to date
>>> now.
>>>>>>>>>>> In short, now built-in function of external systems are defined
>>> as
>>>>>> a
>>>>>>>>>>> special kind of catalog function in Flink, and handled by Flink
>>> as
>>>>>>>>>>> following:
>>>>>>>>>>> - An external built-in function must be associated with a
>>>>>>>>>>> catalog
>>>>>> for
>>>>>>>>>>> the purpose of decoupling flink-table and external systems.
>>>>>>>>>>> - It always resides in front of catalog functions in ambiguous
>>>>>>> function
>>>>>>>>>>> reference order, just like in its own external system
>>>>>>>>>>> - It is a special catalog function that doesn’t have a
>>>>>>> schema/database
>>>>>>>>>>> namespace
>>>>>>>>>>> - It goes thru the same instantiation logic as other user
>>>>>>>>>>> defined
>>>>>>>>>>> catalog functions in the external system
>>>>>>>>>>>
>>>>>>>>>>> Please take another look at the doc, and let me know if you
>>>>>>>>>>> have
>>>>>> more
>>>>>>>>>>> questions.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther
>>>>>>>>>>> <twal...@apache.org>
>>>>>>>> wrote:
>>>>>>>>>>>> Hi Kurt,
>>>>>>>>>>>>
>>>>>>>>>>>> it should not affect the functions and operations we currently
>>>>>> have
>>>>>>> in
>>>>>>>>>>>> SQL. It just categorizes the available built-in functions.
>>>>>>>>>>>> It is
>>>>>>> kind
>>>>>>>>>>>> of
>>>>>>>>>>>> an orthogonal concept to the catalog API but built-in
>>>>>>>>>>>> functions
>>>>>>>> deserve
>>>>>>>>>>>> this special kind of treatment. CatalogFunction still fits
>>>>>> perfectly
>>>>>>>> in
>>>>>>>>>>>> there because the regular catalog object resolution logic
>>>>>>>>>>>> is not
>>>>>>>>>>>> affected. So tables and functions are resolved in the same way
>>> but
>>>>>>>> with
>>>>>>>>>>>> built-in functions that have priority as in the original
>>>>>>>>>>>> design.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Timo
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 03.09.19 15:26, Kurt Young wrote:
>>>>>>>>>>>>> Does this only affect the functions and operations we
>>>>>>>>>>>>> currently
>>>>>>> have
>>>>>>>>>>>> in SQL
>>>>>>>>>>>>> and
>>>>>>>>>>>>> have no effect on tables, right? Looks like this is an
>>>>>> orthogonal
>>>>>>>>>>>> concept
>>>>>>>>>>>>> with Catalog?
>>>>>>>>>>>>> If the answer are both yes, then the catalog function will
>>>>>>>>>>>>> be a
>>>>>>>> weird
>>>>>>>>>>>>> concept?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Kurt
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <
>>> yuzhao....@gmail.com
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> The way you proposed are basically the same as what Calcite
>>>>>>> does, I
>>>>>>>>>>>> think
>>>>>>>>>>>>>> we are in the same line.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Danny Chan
>>>>>>>>>>>>>> 在 2019年9月3日 +0800 PM7:57，Timo Walther
>>>>>>>>>>>>>> <twal...@apache.org
>>>> ，写道：
>>>>>>>>>>>>>>> This sounds exactly as the module approach I mentioned, no?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 03.09.19 13:42, Danny Chan wrote:
>>>>>>>>>>>>>>>> Thanks Bowen for bring up this topic, I think it’s a
>>>>>>>>>>>>>>>> useful
>>>>>>>>>>>>>> refactoring to make our function usage more user friendly.
>>>>>>>>>>>>>>>> For the topic of how to organize the builtin operators and
>>>>>>>>>>>> operators
>>>>>>>>>>>>>> of Hive, here is a solution from Apache Calcite, the Calcite
>>>>>> way
>>>>>>> is
>>>>>>>>>>>> to make
>>>>>>>>>>>>>> every dialect operators a “Library”, user can specify which
>>>>>>>>>>>> libraries they
>>>>>>>>>>>>>> want to use for a sql query. The builtin operators always
>>> comes
>>>>>>> as
>>>>>>>>>>>> the
>>>>>>>>>>>>>> first class objects and the others are used from the order
>>> they
>>>>>>>>>>>> appears.
>>>>>>>>>>>>>> Maybe you can take a reference.
>>>>>>>>>>>>>>>> [1]
>>> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
>>>
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Danny Chan
>>>>>>>>>>>>>>>> 在 2019年8月28日 +0800 AM2:50，Bowen Li
>>>>>>>>>>>>>>>> <bowenl...@gmail.com
>>>> ，写道：
>>>>>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'd like to kick off a discussion on reworking Flink's
>>>>>>>>>>>>>> FunctionCatalog.
>>>>>>>>>>>>>>>>> It's critically helpful to improve function usability in
>>>>>> SQL.
>>> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
>>>
>>>>>>>>>>>>>>>>> In short, it:
>>>>>>>>>>>>>>>>> - adds support for precise function reference with
>>>>>>>> fully/partially
>>>>>>>>>>>>>>>>> qualified name
>>>>>>>>>>>>>>>>> - redefines function resolution order for ambiguous
>>> function
>>>>>>>>>>>>>> reference
>>>>>>>>>>>>>>>>> - adds support for Hive's rich built-in functions
>>>>>>>>>>>>>>>>> (support
>>>>>> for
>>>>>>>>>>>> Hive
>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>> defined functions was already added in 1.9.0)
>>>>>>>>>>>>>>>>> - clarifies the concept of temporary functions
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Would love to hear your thoughts.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Bowen
>>>>>> -- 
>>>>>> Xuefu Zhang
>>>>>>
>>>>>> "In Honey We Trust!"
>>>>>>
>>>>
>>> -- 
>>> Xuefu Zhang
>>>
>>> "In Honey We Trust!"
>>>
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Reply via email to