[ https://issues.apache.org/jira/browse/HIVE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845051#comment-13845051 ]
Xuefu Zhang commented on HIVE-5356: ----------------------------------- {quote} 1. The changes to floating point arithmetic are not backward compatible, and there is no SQL compliance benefit for that. {quote} The main reason is to be in line with MySQL and simplify the implementation. It could be kept in backward compatible manner. {quote} 2.2 It will not be backward compatible with some udf implementations ( I believe this is same with change in floating point return type). {quote} SQL standard says that exact type division should result exact type. Double is non-compliant. Changing to int type has the same issue you're referring to. {quote} 2.2 Integer arithmetic becoming NULL in some cases {quote} First, I don't think there is any standard saying that integer operation should not emit NULL. NULL is generated when an error occurs (such as overflow, divide by zero, etc. Currently emitting NULL is one of the few error handling options a modern databases have, but is the only one that hive has, though Hive isn't consistent. I'd argue generating NULL value is worse than generating bad or wrong values in error cases. To make things worse, user is not aware of that. (Take HIVE-5660 as an example.) We may introduce different server mode to config different error handling (HIVE-5438). {quote} 2.3 more than 50x performance degradation for the arithmetic operation {quote} 50x performance degradation came from a unit test, which doesn't necessary represents the Hive overall performance. Hive's performance will not be judged solely by int/int. The bigger question is: do we need some thing that's working and right, or something that's doing bad and fast. Performance can be improved down the road, but functionality deviations are hard to correct, as has been demonstrated in this discussion. Backward compatibility is a valid concern. However, the question is whether Hive is at a point where this has to be kept with any cost or we are willing to sacrifice some and achieve something that we think right. I have seen arguments from points of implementation over functionality, performance over correctness, which is, in my opinion, ill-constructed. > Move arithmatic UDFs to generic UDF implementations > --------------------------------------------------- > > Key: HIVE-5356 > URL: https://issues.apache.org/jira/browse/HIVE-5356 > Project: Hive > Issue Type: Task > Components: UDF > Affects Versions: 0.11.0 > Reporter: Xuefu Zhang > Assignee: Xuefu Zhang > Fix For: 0.13.0 > > Attachments: HIVE-5356.1.patch, HIVE-5356.10.patch, > HIVE-5356.11.patch, HIVE-5356.12.patch, HIVE-5356.2.patch, HIVE-5356.3.patch, > HIVE-5356.4.patch, HIVE-5356.5.patch, HIVE-5356.6.patch, HIVE-5356.7.patch, > HIVE-5356.8.patch, HIVE-5356.9.patch > > > Currently, all of the arithmetic operators, such as add/sub/mult/div, are > implemented as old-style UDFs and java reflection is used to determine the > return type TypeInfos/ObjectInspectors, based on the return type of the > evaluate() method chosen for the expression. This works fine for types that > don't have type params. > Hive decimal type participates in these operations just like int or double. > Different from double or int, however, decimal has precision and scale, which > cannot be determined by just looking at the return type (decimal) of the UDF > evaluate() method, even though the operands have certain precision/scale. > With the default of "decimal" without precision/scale, then (10, 0) will be > the type params. This is certainly not desirable. > To solve this problem, all of the arithmetic operators would need to be > implemented as GenericUDFs, which allow returning ObjectInspector during the > initialize() method. The object inspectors returned can carry type params, > from which the "exact" return type can be determined. > It's worth mentioning that, for user UDF implemented in non-generic way, if > the return type of the chosen evaluate() method is decimal, the return type > actually has (10,0) as precision/scale, which might not be desirable. This > needs to be documented. > This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit > the scope of review. The remaining ones will be covered under HIVE-5706. -- This message was sent by Atlassian JIRA (v6.1.4#6159)