[ https://issues.apache.org/jira/browse/HIVE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843517#comment-13843517 ]
Xuefu Zhang commented on HIVE-5356: ----------------------------------- {quote} Before this patch was committed, integer-integer division vectorized. Now it does not. This is a performance regression and also a functional regression for "EXPLAIN". This may have been caught by the vectorization tests (see test output in comment above on about 3 Nov), but maybe it was not clear to the developers of this patch because vectorization is pretty new. If a vectorization test .q.out file contains in EXPLAIN output the string "Vectorized execution: true" then the plan vectorizes. It is important that future patches not regress this behavior for performance reasons. I would like to see any regressions to vectorization be fixed before patches are applied, ideally, or else have some discussion and consensus. {quote} While this patch may prevents vectorization for int/int, I don't think we should emphasize the idea of implementation over functionality, as this occurred over and over again. I also disagree about the label of "functional regression" for obvious reasons. Rather, I think functionality prevails over implementation. A feature with wrong functionality is as bad as, if not worse than, a bad performance. Having said this, I still support vectorization, but I would use this to kill anything that might impact vectorization. > Move arithmatic UDFs to generic UDF implementations > --------------------------------------------------- > > Key: HIVE-5356 > URL: https://issues.apache.org/jira/browse/HIVE-5356 > Project: Hive > Issue Type: Task > Components: UDF > Affects Versions: 0.11.0 > Reporter: Xuefu Zhang > Assignee: Xuefu Zhang > Fix For: 0.13.0 > > Attachments: HIVE-5356.1.patch, HIVE-5356.10.patch, > HIVE-5356.11.patch, HIVE-5356.12.patch, HIVE-5356.2.patch, HIVE-5356.3.patch, > HIVE-5356.4.patch, HIVE-5356.5.patch, HIVE-5356.6.patch, HIVE-5356.7.patch, > HIVE-5356.8.patch, HIVE-5356.9.patch > > > Currently, all of the arithmetic operators, such as add/sub/mult/div, are > implemented as old-style UDFs and java reflection is used to determine the > return type TypeInfos/ObjectInspectors, based on the return type of the > evaluate() method chosen for the expression. This works fine for types that > don't have type params. > Hive decimal type participates in these operations just like int or double. > Different from double or int, however, decimal has precision and scale, which > cannot be determined by just looking at the return type (decimal) of the UDF > evaluate() method, even though the operands have certain precision/scale. > With the default of "decimal" without precision/scale, then (10, 0) will be > the type params. This is certainly not desirable. > To solve this problem, all of the arithmetic operators would need to be > implemented as GenericUDFs, which allow returning ObjectInspector during the > initialize() method. The object inspectors returned can carry type params, > from which the "exact" return type can be determined. > It's worth mentioning that, for user UDF implemented in non-generic way, if > the return type of the chosen evaluate() method is decimal, the return type > actually has (10,0) as precision/scale, which might not be desirable. This > needs to be documented. > This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit > the scope of review. The remaining ones will be covered under HIVE-5706. -- This message was sent by Atlassian JIRA (v6.1.4#6159)