[ 
https://issues.apache.org/jira/browse/HIVE-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-1751:
------------------------------

    Attachment: HIVE-1751.1.patch

ExprNodeColumnEvaluator.evaluate() is very heavily used function. For most 
queries, it is called multiple time per row. In the group-by query in the 
benchmark, it is even 10 times per row.

This function call sometimes takes 17%-20% CPU time. Usually 
ExprNodeColumnEvaluator.evaluate() itself takes 2%-3%, 
UnionStructObjectInspector.getStructFieldData() itself takes 2%-3%, 
ColumnarStruct.uncheckedGetField() itself takes 3%.

It's hard to come up with a general solution that reduce the costs in a 
structual way. I tried to did several small code rewriting and hope we can get 
slight improvements:

1. nullSequence is not passed in for every call but from constructor
2. Restructure ColumnarStruct a little bit.
3. In ExprNodeColumnEvaluator, makes the single level special case, which in 
most of the time is the common case when referring a column.

When trying to optimize functions which already only take 3%, it's hard to 
verify the performance enhancement since experiments anyway have slight 
variation eveyr time.

For 1 and 2, I think they anyway make code better readable. I ran many times, 
and consistently see about 1% improvement too.
3 might make code less readable, but I see about 5% improvement from some 
simple group-by query.


> Optimize ColumnarStructObjectInspector.getStructFieldData()
> -----------------------------------------------------------
>
>                 Key: HIVE-1751
>                 URL: https://issues.apache.org/jira/browse/HIVE-1751
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-1751.1.patch
>
>
> ColumnarStructObjectInspector.getStructFieldData() is a heavy used function 
> and is expensive.
> By optimizing this function, including ColumnarStruct.uncheckedGetField() 
> called by it, most queries can benefit from it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to