Hi Hive users,

I would like to pursue the discussion that happened during the design of
the feature:
https://issues.apache.org/jira/browse/HIVE-6167

Some concern where raised back then, and I think that maybe now that it has
been implemented, some user feedbacks could bring water to the mill.

Even if I understand the utility of grouping UDFs inside databases, I find
it really annoying not to be able to define my UDFs globally.

For me, one of the main interests of UDFs is to extend the built-in Hive
functions with the company's user-defined functions, either because some
useful generic function are missing in the built-in functions or to add
business-specific functions.

In the latter case, I understand very well the necessity of qualifying them
with a business-specific database name. But in the former case?


Let's take an example:
It happened several times that we needed a Hive UDF that was did not exist
yet on the Hive version that we were currently running. To use it, all we
had to do was take the UDF's source code from a more recent version of
Hive, built it in a JAR, and add the UDF manually.

When we upgraded, we only add to remove our UDF since it was now built-in.

(To be more specific it happened with collect_list prior to Hive 0.13).

With HIVE-6167, this became impossible, since we ought to create a
"database_name.function_name", and use it as is. Hence, when upgrading we
need to rename everywhere "database_name.function_name" with
"function_name".

This is just an example, but I would like to emphasize the point that
sometimes we want to create permanent UDFs that are as global as built-in
UDFs and not bother if it is a built-in or user-defined function. As
someone pointed out in HIVE-6167's discussion, imagine if all the built-in
UDFs had to be called with "sys.function_name".

I would just like to have other Hive user's feedback on that matter.

Did anyone else had similar issues with this behavior? How did you treat
them?

Maybe it would make sense to create a feature request for being able to
specify a GLOBAL keyword when creating a permanent UDF, when we really want
it to be global?

What do you think?

Regards,

Furcy

Reply via email to