[jira] Updated: (HIVE-1738) Optimize Key Comparison in GroupByOperator

Siying Dong (JIRA) Thu, 21 Oct 2010 03:09:44 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Siying Dong updated HIVE-1738:
------------------------------

    Attachment: HIVE.1738.1.patch

I compare performance for a simple group-by query:

   SELECT col1,col2,col3,col4,count(1)
   FROM source_table
   GROUP BY col1, col2, col3, col4

which returns about 1000 rows.
The input has about 736M rows in about 19GB compressed files.The query started 
67 Mappers.

Map's CPU MILLISECONDS are down from about 2,950,000 to about 2,800,000. I ran 
each query at least 5 times, and the improvement is consistent in every run.

> Optimize Key Comparison in GroupByOperator
> ------------------------------------------
>
>                 Key: HIVE-1738
>                 URL: https://issues.apache.org/jira/browse/HIVE-1738
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE.1738.1.patch
>
>
> GroupByOperator uses ObjectInspectorUtils.compare() to compare keys, which is 
> written for generalized object comparisons, which is not optimized for 
> group-by operator. By optimizing this logic, we expect to see obvious 
> improvements in GroupByOperator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1738) Optimize Key Comparison in GroupByOperator

Reply via email to