[ https://issues.apache.org/jira/browse/HIVE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siying Dong updated HIVE-1738: ------------------------------ Attachment: HIVE.1738.1.patch I compare performance for a simple group-by query: SELECT col1,col2,col3,col4,count(1) FROM source_table GROUP BY col1, col2, col3, col4 which returns about 1000 rows. The input has about 736M rows in about 19GB compressed files.The query started 67 Mappers. Map's CPU MILLISECONDS are down from about 2,950,000 to about 2,800,000. I ran each query at least 5 times, and the improvement is consistent in every run. > Optimize Key Comparison in GroupByOperator > ------------------------------------------ > > Key: HIVE-1738 > URL: https://issues.apache.org/jira/browse/HIVE-1738 > Project: Hive > Issue Type: Improvement > Reporter: Siying Dong > Assignee: Siying Dong > Attachments: HIVE.1738.1.patch > > > GroupByOperator uses ObjectInspectorUtils.compare() to compare keys, which is > written for generalized object comparisons, which is not optimized for > group-by operator. By optimizing this logic, we expect to see obvious > improvements in GroupByOperator. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.