[ https://issues.apache.org/jira/browse/HIVE-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763129#comment-13763129 ]
Yin Huai commented on HIVE-5258: -------------------------------- Handling aggregations with the DISTINCT keyword will be tricky. We may overwhelm the single reducer when those columns with the DISTINCT keyword have a lot of distinct values. > Optimize aggregations without Group By followed by a Cross Join > --------------------------------------------------------------- > > Key: HIVE-5258 > URL: https://issues.apache.org/jira/browse/HIVE-5258 > Project: Hive > Issue Type: Improvement > Reporter: Yin Huai > Assignee: Yin Huai > > For example, we should use a single MR job to execute the following query > {code:sql} > SELECT * > FROM (SELECT tmp1.cnt1, tmp2.cnt2 > FROM (SELECT count(*) as cnt1 > FROM src1 x) tmp1 > JOIN (SELECT count(*) as cnt2 > FROM src1 y) tmp2) tmp3; > {code} > The reduce phase should have the reduce side GroupByOperators of tmp1 and > tmp2, and the JoinOperator for the cross join. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira