[ https://issues.apache.org/jira/browse/HIVE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844609#comment-13844609 ]
Eric Hanson commented on HIVE-5356: ----------------------------------- I'd prefer that we modify this change to preserve the backward-compatible behavior that int / int yields double. Here’s why: It won’t break existing applications. The existing behavior is quite reasonable and I’ve never heard anybody complain about it. When you divide integers, you often want the information after the decimal. In Hive, you get it now without having to do a type cast. It’s kind of convenient. I think it’s a minor issue that it is not SQL-standard compliant. Double precision divide is almost two orders of magnitude faster than decimal divide It will allow vectorized integer-integer divide to keep working (fixing a regression caused by the patch) Hive is production software with a lot of users. Users do “create table as select …” in their workflows quite often. Their applications are depending on the output data types produced. Changing the result of “create table foo as select intCol1 / intCol2 as newCol, …” so that the data type of newCol is different (decimal instead of double) will be seen by some people as a breaking change in their application. Even if it is not a breaking change functionally, it can cause performance regressions for future queries on the data, since they will be then processing decimal instead of double. Decimal is a heavy-weight data type that I don’t think should ever be produced by an operator unless the user explicitly asked for it, or one of the input types was decimal. It’s inherently slower to do decimal arithmetic than integer/long/float/double arithmetic. Hive is used in performance-oriented, data warehouse database applications. I don’t think, in general, its code should be changed in a way that invites or causes performance regressions in people’s applications. Hive has a small development community. This type of change generates code churn for the community with no strong benefit to the users that I can see, and significant downside to the users. I appreciate the effort by contributors to make the decimal(p, s) data type work in Hive. People want to be able to represent currency and very long integer values, and this will help do that nicely. But I would like to see that they ask for it before they get expression results that use it. If there is a real strong reason and desire to make the result SQL standard compliant, I think int as a result of int/int is a better choice. Then it'd probably be necessary to deprecate the old way and have a switch to control the behavior for a while. > Move arithmatic UDFs to generic UDF implementations > --------------------------------------------------- > > Key: HIVE-5356 > URL: https://issues.apache.org/jira/browse/HIVE-5356 > Project: Hive > Issue Type: Task > Components: UDF > Affects Versions: 0.11.0 > Reporter: Xuefu Zhang > Assignee: Xuefu Zhang > Fix For: 0.13.0 > > Attachments: HIVE-5356.1.patch, HIVE-5356.10.patch, > HIVE-5356.11.patch, HIVE-5356.12.patch, HIVE-5356.2.patch, HIVE-5356.3.patch, > HIVE-5356.4.patch, HIVE-5356.5.patch, HIVE-5356.6.patch, HIVE-5356.7.patch, > HIVE-5356.8.patch, HIVE-5356.9.patch > > > Currently, all of the arithmetic operators, such as add/sub/mult/div, are > implemented as old-style UDFs and java reflection is used to determine the > return type TypeInfos/ObjectInspectors, based on the return type of the > evaluate() method chosen for the expression. This works fine for types that > don't have type params. > Hive decimal type participates in these operations just like int or double. > Different from double or int, however, decimal has precision and scale, which > cannot be determined by just looking at the return type (decimal) of the UDF > evaluate() method, even though the operands have certain precision/scale. > With the default of "decimal" without precision/scale, then (10, 0) will be > the type params. This is certainly not desirable. > To solve this problem, all of the arithmetic operators would need to be > implemented as GenericUDFs, which allow returning ObjectInspector during the > initialize() method. The object inspectors returned can carry type params, > from which the "exact" return type can be determined. > It's worth mentioning that, for user UDF implemented in non-generic way, if > the return type of the chosen evaluate() method is decimal, the return type > actually has (10,0) as precision/scale, which might not be desirable. This > needs to be documented. > This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit > the scope of review. The remaining ones will be covered under HIVE-5706. -- This message was sent by Atlassian JIRA (v6.1.4#6159)