[jira] [Commented] (HIVE-5356) Move arithmatic UDFs to generic UDF implementations

Eric Hanson (JIRA) Tue, 10 Dec 2013 12:10:18 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844609#comment-13844609
 ]


Eric Hanson commented on HIVE-5356:
-----------------------------------

I'd prefer that we modify this change to preserve the backward-compatible 
behavior that int / int yields double. Here’s why:

It won’t break existing applications.

The existing behavior is quite reasonable and I’ve never heard anybody complain 
about it. When you divide integers, you often want the information after the 
decimal. In Hive, you get it now without having to do a type cast. It’s kind of 
convenient. I think it’s a minor issue that it is not SQL-standard compliant. 

Double precision divide is almost two orders of magnitude faster than decimal 
divide

It will allow vectorized integer-integer divide to keep working (fixing a 
regression caused by the patch)

Hive is production software with a lot of users. Users do “create table as 
select …” in their workflows quite often. Their applications are depending on 
the output data types produced. Changing the result of “create table foo as 
select intCol1 / intCol2 as newCol, …” so that the data type of newCol is 
different (decimal instead of double) will be seen by some people as a breaking 
change in their application. Even if it is not a breaking change functionally, 
it can cause performance regressions for future queries on the data, since they 
will be then processing decimal instead of double.

Decimal is a heavy-weight data type that I don’t think should ever be produced 
by an operator unless the user explicitly asked for it, or one of the input 
types was decimal. It’s inherently slower to do decimal arithmetic than 
integer/long/float/double arithmetic. Hive is used in performance-oriented, 
data warehouse database applications. I don’t think, in general, its code 
should be changed in a way that invites or causes performance regressions in 
people’s applications.

Hive has a small development community. This type of change generates code 
churn for the community with no strong benefit to the users that I can see, and 
significant downside to the users.

I appreciate the effort by contributors to make the decimal(p, s) data type 
work in Hive. People want to be able to represent currency and very long 
integer values, and this will help do that nicely. But I would like to see that 
they ask for it before they get expression results that use it.

If there is a real strong reason and desire to make the result SQL standard 
compliant, I think int as a result of int/int is a better choice. Then it'd 
probably be necessary to deprecate the old way and have a switch to control the 
behavior for a while.


> Move arithmatic UDFs to generic UDF implementations
> ---------------------------------------------------
>
>                 Key: HIVE-5356
>                 URL: https://issues.apache.org/jira/browse/HIVE-5356
>             Project: Hive
>          Issue Type: Task
>          Components: UDF
>    Affects Versions: 0.11.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>             Fix For: 0.13.0
>
>         Attachments: HIVE-5356.1.patch, HIVE-5356.10.patch, 
> HIVE-5356.11.patch, HIVE-5356.12.patch, HIVE-5356.2.patch, HIVE-5356.3.patch, 
> HIVE-5356.4.patch, HIVE-5356.5.patch, HIVE-5356.6.patch, HIVE-5356.7.patch, 
> HIVE-5356.8.patch, HIVE-5356.9.patch
>
>
> Currently, all of the arithmetic operators, such as add/sub/mult/div, are 
> implemented as old-style UDFs and java reflection is used to determine the 
> return type TypeInfos/ObjectInspectors, based on the return type of the 
> evaluate() method chosen for the expression. This works fine for types that 
> don't have type params.
> Hive decimal type participates in these operations just like int or double. 
> Different from double or int, however, decimal has precision and scale, which 
> cannot be determined by just looking at the return type (decimal) of the UDF 
> evaluate() method, even though the operands have certain precision/scale. 
> With the default of "decimal" without precision/scale, then (10, 0) will be 
> the type params. This is certainly not desirable.
> To solve this problem, all of the arithmetic operators would need to be 
> implemented as GenericUDFs, which allow returning ObjectInspector during the 
> initialize() method. The object inspectors returned can carry type params, 
> from which the "exact" return type can be determined.
> It's worth mentioning that, for user UDF implemented in non-generic way, if 
> the return type of the chosen evaluate() method is decimal, the return type 
> actually has (10,0) as precision/scale, which might not be desirable. This 
> needs to be documented.
> This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit 
> the scope of review. The remaining ones will be covered under HIVE-5706.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5356) Move arithmatic UDFs to generic UDF implementations

Reply via email to