[ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013938#comment-13013938 ]
Amareshwari Sriramadasu commented on HIVE-2056: ----------------------------------------------- For a query of the form, "From table T insert overwrite table test1 select col1, count(distinct colx) group by col1 insert overwrite table test2 select col2, count(distinct colx) group by col2;" it is not possible to generate a single M/R job, because partitioning the input row by both col1 and col2 in a single stage does not work here. If the groupby keys are such that one keyset is a subset of the other, i.e. of the following form: "From table T insert overwrite table test1 select col1, count(distinct colx) group by col1 insert overwrite table test2 select col1, col2, count(distinct colx) group by col1, col2;", we can run it in a single MR job by spraying over common groupby keyset( i.e. col1). Will implement this and see if it reduces query execution time. Thoughts? > Generate single MR job for multi groupby query. > ----------------------------------------------- > > Key: HIVE-2056 > URL: https://issues.apache.org/jira/browse/HIVE-2056 > Project: Hive > Issue Type: Improvement > Reporter: Amareshwari Sriramadasu > Assignee: Amareshwari Sriramadasu > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira