[ 
https://issues.apache.org/jira/browse/HIVE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702593#comment-13702593
 ] 

Eric Hanson commented on HIVE-4822:
-----------------------------------

Since this is a performance improvement and not a functional improvement, and 
we are implementing a new query execution path, it is inevitable that there 
will be code growth. We have to take care that the new code path provides 
correct results. But we will be practical about this and re-use existing code 
or call the same library routines in the inner loop. 

If you want to see an example of how we implemented some string functions, see 
StringUnaryUDF.java and its subclasses in the vectorization branch. 
StringLength.java takes a different approach and re-implements the length 
calculation differently for speed. It illustrates tradeoffs when implementing 
built in functions. See also the template ColumnArithmeticScalar.txt to see how 
templates can reduce the total amount of code needed to a reasonable amount 
while still getting top performance.

The 2005 paper on MonetDB X100 cited in the design spec is good background.
                
> implement vectorized math functions
> -----------------------------------
>
>                 Key: HIVE-4822
>                 URL: https://issues.apache.org/jira/browse/HIVE-4822
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: vectorization-branch
>            Reporter: Eric Hanson
>
> Implement vectorized support for the all the built-in math functions. This 
> includes implementing the vectorized operation, and tying it all together in 
> VectorizationContext so it runs end-to-end. These functions include:
> round(Col)
> Round(Col, N)
> Floor(Col)
> Ceil(Col)
> Rand(), Rand(seed)
> Exp(Col)
> Ln(Col)
> Log10(Col)
> Log2(Col)
> Log(base, Col)
> Pow(col, p), Power(col, p)
> Sqrt(Col)
> Bin(Col)
> Hex(Col)
> Unhex(Col)
> Conv(Col, from_base, to_base)
> Abs(Col)
> Pmod(arg1, arg2)
> Sin(Col)
> Asin(Col)
> Cos(Col)
> ACos(Col)
> Atan(Col)
> Degrees(Col)
> Radians(Col)
> Positive(Col)
> Negative(Col)
> Sign(Col)
> E()
> Pi()
> To reduce the total code volume, do an implicit type cast from non-double 
> input types to double. 
> Also, POSITITVE and NEGATIVE are syntactic sugar for unary + and unary -, so 
> reuse code for those as appropriate.
> Try to call the function directly in the inner loop and avoid new() or 
> expensive operations, as appropriate.
> Templatize the code where appropriate, e.g. all the unary function of form 
> DOUBLE func(DOUBLE)
> can probably be done with a template.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to