[ https://issues.apache.org/jira/browse/HIVE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702593#comment-13702593 ]
Eric Hanson commented on HIVE-4822: ----------------------------------- Since this is a performance improvement and not a functional improvement, and we are implementing a new query execution path, it is inevitable that there will be code growth. We have to take care that the new code path provides correct results. But we will be practical about this and re-use existing code or call the same library routines in the inner loop. If you want to see an example of how we implemented some string functions, see StringUnaryUDF.java and its subclasses in the vectorization branch. StringLength.java takes a different approach and re-implements the length calculation differently for speed. It illustrates tradeoffs when implementing built in functions. See also the template ColumnArithmeticScalar.txt to see how templates can reduce the total amount of code needed to a reasonable amount while still getting top performance. The 2005 paper on MonetDB X100 cited in the design spec is good background. > implement vectorized math functions > ----------------------------------- > > Key: HIVE-4822 > URL: https://issues.apache.org/jira/browse/HIVE-4822 > Project: Hive > Issue Type: Sub-task > Affects Versions: vectorization-branch > Reporter: Eric Hanson > > Implement vectorized support for the all the built-in math functions. This > includes implementing the vectorized operation, and tying it all together in > VectorizationContext so it runs end-to-end. These functions include: > round(Col) > Round(Col, N) > Floor(Col) > Ceil(Col) > Rand(), Rand(seed) > Exp(Col) > Ln(Col) > Log10(Col) > Log2(Col) > Log(base, Col) > Pow(col, p), Power(col, p) > Sqrt(Col) > Bin(Col) > Hex(Col) > Unhex(Col) > Conv(Col, from_base, to_base) > Abs(Col) > Pmod(arg1, arg2) > Sin(Col) > Asin(Col) > Cos(Col) > ACos(Col) > Atan(Col) > Degrees(Col) > Radians(Col) > Positive(Col) > Negative(Col) > Sign(Col) > E() > Pi() > To reduce the total code volume, do an implicit type cast from non-double > input types to double. > Also, POSITITVE and NEGATIVE are syntactic sugar for unary + and unary -, so > reuse code for those as appropriate. > Try to call the function directly in the inner loop and avoid new() or > expensive operations, as appropriate. > Templatize the code where appropriate, e.g. all the unary function of form > DOUBLE func(DOUBLE) > can probably be done with a template. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira