Thanks all for your input!
I've updated FLIP-57 accordingly. To summarize the changes:
- introduced new concept of "Temporary system functions", which has no
namespace and override built-in functions
- repositioned "temporary functions" to be those with namespaces and
override catalog
"SYSTEM" sounds good to me too.
Best,
Jark
On Mon, 23 Sep 2019 at 19:04, Fabian Hueske wrote:
> +1 for CREATE TEMPORARY SYSTEM FUNCTION xxx
>
> Cheers, Fabian
>
> Am Sa., 21. Sept. 2019 um 06:58 Uhr schrieb Bowen Li >:
>
> > "SYSTEM" sounds good to me. FYI, this FLIP only impacts low level of
+1 for CREATE TEMPORARY SYSTEM FUNCTION xxx
Cheers, Fabian
Am Sa., 21. Sept. 2019 um 06:58 Uhr schrieb Bowen Li :
> "SYSTEM" sounds good to me. FYI, this FLIP only impacts low level of the
> SQL function stack and won't actually involve any DDL, thus I will just
> document the decision and we sh
"SYSTEM" sounds good to me. FYI, this FLIP only impacts low level of the
SQL function stack and won't actually involve any DDL, thus I will just
document the decision and we should keep it in mind when it's time to
implement the DDLs.
I'm in the process of updating the FLIP to reflect changes requ
I also like the 'System' keyword. I think we can assume we reached
consensus on this topic.
On Sat, 21 Sep 2019, 06:37 Xuefu Z, wrote:
> +1 for using the keyword "SYSTEM". Thanks to Timo for chiming in!
>
> --Xuefu
>
> On Fri, Sep 20, 2019 at 3:28 PM Timo Walther wrote:
>
> > Hi everyone,
> >
>
+1 for using the keyword "SYSTEM". Thanks to Timo for chiming in!
--Xuefu
On Fri, Sep 20, 2019 at 3:28 PM Timo Walther wrote:
> Hi everyone,
>
> sorry, for the late replay. I give also +1 for option #2. Thus, I guess
> we have a clear winner.
>
> I would also like to find a better keyword/synta
Hi everyone,
sorry, for the late replay. I give also +1 for option #2. Thus, I guess
we have a clear winner.
I would also like to find a better keyword/syntax for this statement.
Esp. the BUILTIN keyword can confuse people, because it could be written
as BUILTIN, BUILDIN, BUILT_IN, or BUILD_
Another reason I prefer "CREATE TEMPORARY BUILTIN FUNCTION" over "ALTER
BUILTIN FUNCTION xxx TEMPORARILY" is - what if users want to drop the
temporary built-in function in the same session? With the former one, they
can run something like "DROP TEMPORARY BUILTIN FUNCTION"; With the latter
one, I'm
Hi,
Thanks everyone for your votes. I summarized the result as following:
#1:3 (+1), 1 (0), 4(-1) - net: -1
#2:4(0), 2 (+1), 1(+0.5) - net: +2.5
Dawid -1/0 depending on keyword
#3:2(+1), 3(-1), 3(0) - net: -1
Given the result, I'd like to change my vote for #2 from 0 to +1, to
I agree, it's very similar from the implementation point of view and the
implications.
IMO, the difference is mostly on the mental model for the user.
Instead of having a special class of temporary functions that have
precedence over builtin functions it suggests to temporarily change
built-in fun
Hi Fabian,
I think it's almost the same with #2 with different keyword:
CREATE TEMPORARY BUILTIN FUNCTION xxx
Best,
Kurt
On Thu, Sep 19, 2019 at 5:50 PM Fabian Hueske wrote:
> Hi,
>
> I thought about it a bit more and think that there is some good value in my
> last proposal.
>
> A lot of co
Hi,
I thought about it a bit more and think that there is some good value in my
last proposal.
A lot of complexity comes from the fact that we want to allow overriding
built-in functions which are differently addressed as other functions (and
db objects).
We could just have "CREATE TEMPORARY FUNC
Hi everyone,
I thought again about option #1 and something that I don't like is that the
resolved address of xyz is different in "CREATE FUNCTION xyz" and "CREATE
TEMPORARY FUNCTION xyz".
IMO, adding the keyword "TEMPORARY" should only change the lifecycle of the
function, but not where it is loca
After reading Kurt’s reasoning I’m bumping my vote for #2 from -1 to +0, or
even +0.5, so my final vote is:
-1 for #1
+0.5 for #2
+1 for #3
Re confusion about “system_db”. I think quite a lot of DBs are storing some
meta tables in some system and often hidden db/schema, so I don’t think that if
I know Hive and Spark can shadow built-in functions by temporary function.
Mysql, Oracle, Sql server can not shadow.
User can use full names to access functions instead of shadowing.
So I think it is a completely new thing, and the direct way to deal with new
things is to add new grammar. So,
+1
And let me make my vote complete:
-1 for #1
+1 for #2 with different keyword
-0 for #3
Best,
Kurt
On Thu, Sep 19, 2019 at 4:40 PM Kurt Young wrote:
> Looks like I'm the only person who is willing to +1 to #2 for now :-)
> But I would suggest to change the keyword from GLOBAL to
> something li
Looks like I'm the only person who is willing to +1 to #2 for now :-)
But I would suggest to change the keyword from GLOBAL to
something like BUILTIN.
I think #2 and #3 are almost the same proposal, just with different
format to indicate whether it want to override built-in functions.
My biggest
Hi,
It is a quite long discussion to follow and I hope I didn’t misunderstand
anything. From the proposals presented by Xuefu I would vote:
-1 for #1 and #2
+1 for #3
Besides #3 being IMO more general and more consistent, having qualified names
(#3) would help/make easier for someone to use c
I agree with Xuefu that inconsistent handling with all the other objects is
not a big problem.
Regarding to option#3, the special "system.system" namespace may confuse
users.
Users need to know the set of built-in function names to know when to use
"system.system" namespace.
What will happen if us
@Dawid, Re: we also don't need additional referencing the specialcatalog
anywhere.
True. But once we allow such reference, then user can do so in any possible
place where a function name is expected, for which we have to handle.
That's a big difference, I think.
Thanks,
Xuefu
On Wed, Sep 18, 201
Re: The reason why I prefer option 3 is that in option 3 all objects
internally are identified with 3 parts.
True, but the problem we have is not about how to differentiate each type
objects internally. Rather, it's rather about how a user referencing an
object unambiguously and consistently.
Tha
@Bowen I am not suggesting introducing additional catalog. I think we need
to get rid of the current built-in catalog.
@Xuefu in option #3 we also don't need additional referencing the special
catalog anywhere else besides in the CREATE statement. The resolution
behaviour is exactly the same in bo
Hi Dawid,
"GLOBAL" is a temporary keyword that was given to the approach. It can be
changed to something else for better.
The difference between this and the #3 approach is that we only need the
keyword for this create DDL. For other places (such as function
referencing), no keyword or special na
Hi,
For #2, as Xuefu and I discussed offline, the key point is to introduce a
keyword to SQL DDL to distinguish temp function that override built-in
functions v.s. temp functions that override catalog functions. It can be
something else than "GLOBAL", like "BUILTIN" (e.g. "CREATE BUILTIN TEMP
FUNC
Last additional comment on Option 2. The reason why I prefer option 3 is
that in option 3 all objects internally are identified with 3 parts. This
makes it easier to handle at different locations e.g. while persisting
views, as all objects have uniform representation.
On Thu, 19 Sep 2019, 07:31 Da
Hi,
I think it makes sense to start voting at this point.
Option 1: Only 1-part identifiers
PROS:
- allows shadowing built-in functions
CONS:
- incosistent with all the other objects, both permanent & temporary
- does not allow shadowing catalog functions
Option 2: Special keyword for built-in fu
Hi Aljoscha,
Thanks for the summary and these are great questions to be answered. The
answer to your first question is clear: there is a general agreement to
override built-in functions with temp functions.
However, your second and third questions are sort of related, as a function
reference can
Hi,
I think this discussion and the one for FLIP-64 are very connected. To resolve
the differences, think we have to think about the basic principles and find
consensus there. The basic questions I see are:
- Do we want to support overriding builtin functions?
- Do we want to support overridi
Hi,
+1 to strive for reaching consensus on the remaining topics. We are close to
the truth. It will waste a lot of time if we resume the topic some time later.
+1 to “1-part/override” and I’m also fine with Timo’s “cat.db.fun” way to
override a catalog function.
I’m not sure about “system.sy
Hi everyone,
@Xuefu: I would like to avoid adding too many things incrementally.
Users should be able to override all catalog objects consistently
according to FLIP-64 (Support for Temporary Objects in Table module). If
functions are treated completely different, we need more code and
special
hi, everyone
I think this flip is very meaningful. it supports functions that can be
shared by different catalogs and dbs, reducing the duplication of functions.
Our group based on flink's sql parser module implements create function
feature, stores the parsed function metadata and schema into mys
Thanks to Tmo and Dawid for sharing thoughts.
It seems to me that there is a general consensus on having temp functions
that have no namespaces and overwrite built-in functions. (As a side note
for comparability, the current user defined functions are all temporary and
having no namespaces.)
Neve
Hi,
Another idea to consider on top of Timo's suggestion. How about we have a
special namespace (catalog + database) for built-in objects? This catalog
would be invisible for users as Xuefu was suggesting.
Then users could still override built-in functions, if they fully qualify
object with the bu
Hi Bowen,
I understand the potential benefit of overriding certain built-in
functions. I'm open to such a feature if many people agree. However, it
would be great to still support overriding catalog functions with
temporary functions in order to prototype a query even though a
catalog/databas
Hi Fabian,
Yes, I agree 1-part/no-override is the least favorable thus I didn't
include that as a voting option, and the discussion is mainly between
1-part/override builtin and 3-part/not override builtin.
Re > However, it means that temp functions are differently treated than
other db objects.
Hi all,
Thanks Dawid for the additional explanation!
As others summarized there are two questions:
1) Are temporal functions a) top-level functions (1-part address) and not
associated with a catalog/db or b) do we threat them like any other
database object with a 3-part address.
2) If we treat t
Hi,
Thanks @Fabian @Dawid and everyone else for sharing your thoughts!
First, I'd like to take Hive built-in functions out of this FLIP to keep
our original scope and make it less controversial on a potential modular
approach. I will remove Hive built-in functions from the google doc.
Then the f
Hi Fabian,
Thank you for your response.
Regarding the temporary function, just wanted to clarify one thing: the
3-part identifier does not mean the user always has to provide the catalog
& database explicitly. The same way user does not have to provide them in
e.g. when creating permanent table, vi
Hi all,
I'd like to add my opinion on this topic as well ;-)
In general, I think overriding built-in function with temp functions has a
couple of benefits but also a few challenges:
* Users can reimplement the behavior of a built-in functions of a different
system, e.g., for backward compatibili
Hi,
W.r.t temp functions, I feel both options have their benefits and can
theoretically achieve similar functionalities one way or another. In the
end, it's more about use cases, users habits, and trade-offs.
Re> Not always users are in full control of the catalog functions. There is
also the cas
I agree the consequences of the decision are substantial. Let's see what
others think.
-- Catalog functions are defined by users, and we suppose they can
drop/alter it in any way they want. Thus, overwriting a catalog function
doesn't seem to be a strong use case that we should be concerned about.
Hi Dawid,
Thank you for your summary. While the only difference in the two proposals
is one- or three-part in naming, the consequence would be substantial.
To me, there are two major use cases of temporary functions compared to
persistent ones:
1. Temporary in nature and auto managed by the sessi
Hi Xuefu,
Thank you for your answers.
Let me summarize my understanding. In principle we differ only in
regards to the fact if a temporary function can be only 1-part or only
3-part identified. I can reconfirm that if the community decides it
prefers the 1-part approach I will commit to that, wit
Hi David,
Thanks for sharing your thoughts and request for clarifications. I believe
that I fully understood your proposal, which does has its merit. However,
it's different from ours. Here are the answers to your questions:
Re #1: yes, the temp functions in the proposal are global and have just
Yeah, sorry I prematurely concluded the discussions here, thinking they
were not converging. However, I did feel we needed to do more research and
restart with individual topics.
Please continue voicing your comments/suggestions while we revise and
clarify our proposal.
Thanks,
Xuefu
On Thu, Sep
Hi Xuefu,
Just wanted to summarize my opinion on the one topic (temporary functions).
My preference would be to make temporary functions always 3-part
qualified (as a result that would prohibit overriding built-in
functions). Having said that if the community decides that it's better
to allow ove
Maybe Xuefu missed my email. Please let me know what your thoughts are on
the summary, if there's still major controversy, I can take time to
reevaluate that part.
On Wed, Sep 4, 2019 at 2:25 PM Xuefu Z wrote:
> Thank all for the sharing thoughts. I think we have gathered some useful
> initial
Thank all for the sharing thoughts. I think we have gathered some useful
initial feedback from this long discussion with a couple of focal points
sticking out.
We will go back to do more research and adapt our proposal. Once it's
ready, we will ask for a new round of review. If there is any disag
Hi David,
Thanks for sharing the findings about temporary functions. Because of
strong inconsistency observed in Spark, we can probably ignore it for now.
For Hive, I understand one may not be able to overwrite everything, but the
capability is being offered.
Whether we offer this capability is t
Let me try to summarize and conclude the long thread so far:
1. For order of temp function v.s. built-in function:
I think Dawid's point that temp function should be of fully qualified path
is a better reasoning to back the newly proposed order, and i agree we
don't need to follow Hive/Spark.
Ho
Hi,
Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they
are very inconsistent in that manner (spark being way worse on that).
Hive:
You cannot overwrite all the built-in functions. I could overwrite most
of t
Hi all,
thanks for the healthy discussion. It is already a very long discussion
with a lot of text. So I will just post my opinion to a couple of
statements:
> Hive built-in functions are not part of Flink built-in functions,
they are catalog functions
That is not entirely true. Correct me
Hi all,
Regarding #1 temp function <> built-in function and naming.
I'm fine with temp functions should precede built-in function and can
override built-in functions (we already support to override built-in
function in 1.9).
If we don't allow the same name as a built-in function, I'm afraid we wil
Hi David,
Thank you for sharing your findings. It seems to me that there is no SQL
standard regarding temporary functions. There are few systems that support
it. Here are what I have found:
1. Hive: no DB qualifier allowed. Can overwrite built-in.
2. Spark: basically follows Hive (
https://docs.d
Hi all,
Just an opinion on the built-in <> temporary functions resolution and
NAMING issue. I think we should not allow overriding the built-in
functions, as this may pose serious issues and to be honest is rather
not feasible and would require major rework. What happens if a user
wants to overrid
Hi,
I agree with Xuefu that the main controversial points are mainly the two
places. My thoughts on them:
1) Determinism of referencing Hive built-in functions. We can either remove
Hive built-in functions from ambiguous function resolution and require
users to use special syntax for their qualif
Thank you for your wonderful points.
I like timo's proposal to enrich built-in functions to flexible function
modules (For example, the financial model is useful to bank system).
But I agree with bowen, I don't think hive functions deserves be a
function module. I think all function modules sho
>From what I have seen, there are a couple of focal disagreements:
1. Resolution order: temp function --> flink built-in function --> catalog
function vs flink built-in function --> temp function -> catalog function.
2. "External" built-in functions: how to treat built-in functions in
external sys
Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is the
same
as Bowen's. But after thinking about it, I'm currently lean to Timo's
suggestion.
The reason is backward compatibility. If we follow Bowen's approach, let's
say we
first find function in Flink's built-in functions, and
Hi all,
Thanks for the feedback. Just a kindly reminder that the [Proposal] section
in the google doc was updated, please take a look first and let me know if
you have more questions.
On Tue, Sep 3, 2019 at 4:57 PM Bowen Li wrote:
> Hi Timo,
>
> Re> 1) We should not have the restriction "hive b
Hi Timo,
Re> 1) We should not have the restriction "hive built-in functions can only
> be used when current catalog is hive catalog". Switching a catalog
> should only have implications on the cat.db.object resolution but not
> functions. It would be quite convinient for users to use Hive built-in
Hi Jingsong,
Re> 1.Hive built-in functions is an intermediate solution. So we should
> not introduce interfaces to influence the framework. To make
> Flink itself more powerful, we should implement the functions
> we need to add.
Yes, please see the doc.
Re> 2.Non-flink built-in functions are ea
Hi Kurt,
Re: > What I want to propose is we can merge #3 and #4, make them both under
>"catalog" concept, by extending catalog function to make it have ability to
>have built-in catalog functions. Some benefits I can see from this
approach:
>1. We don't have to introduce new concept like external
Hi Kurt,
it should not affect the functions and operations we currently have in
SQL. It just categorizes the available built-in functions. It is kind of
an orthogonal concept to the catalog API but built-in functions deserve
this special kind of treatment. CatalogFunction still fits perfectly
Does this only affect the functions and operations we currently have in SQL
and
have no effect on tables, right? Looks like this is an orthogonal concept
with Catalog?
If the answer are both yes, then the catalog function will be a weird
concept?
Best,
Kurt
On Tue, Sep 3, 2019 at 8:10 PM Danny C
The way you proposed are basically the same as what Calcite does, I think we
are in the same line.
Best,
Danny Chan
在 2019年9月3日 +0800 PM7:57,Timo Walther ,写道:
> This sounds exactly as the module approach I mentioned, no?
>
> Regards,
> Timo
>
> On 03.09.19 13:42, Danny Chan wrote:
> > Thanks Bowe
This sounds exactly as the module approach I mentioned, no?
Regards,
Timo
On 03.09.19 13:42, Danny Chan wrote:
Thanks Bowen for bring up this topic, I think it’s a useful refactoring to make
our function usage more user friendly.
For the topic of how to organize the builtin operators and oper
Thanks Bowen for bring up this topic, I think it’s a useful refactoring to make
our function usage more user friendly.
For the topic of how to organize the builtin operators and operators of Hive,
here is a solution from Apache Calcite, the Calcite way is to make every
dialect operators a “Libr
Hi Bowen,
thanks for your proposal. Here are some thoughts:
1) We should not have the restriction "hive built-in functions can only
be used when current catalog is hive catalog". Switching a catalog
should only have implications on the cat.db.object resolution but not
functions. It would be q
Thanks Bowen:
+1 for this. And +1 to Kurt's suggestion. My other points are:
1.Hive built-in functions is an intermediate solution. So we should
not introduce interfaces to influence the framework. To make
Flink itself more powerful, we should implement the functions
we need to add.
2.Non-fli
Thanks Bowen for driving this.
+1 for the general idea. It makes the function resolved behavior more
clear and deterministic. Besides, the user can use all hive built-in
functions, which is a great feature.
I only have one comment, but maybe it may touch your design so I think
it would make sense
Thanks everyone for the feedback.
I have updated the document accordingly. Here're the summary of changes:
- clarify the concept of temporary functions, to facilitate deciding
function resolution order
- provide two options to support Hive built-in functions, with the 2nd one
being preferred
- ad
72 matches
Mail list logo