[ https://issues.apache.org/jira/browse/HIVE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966046#action_12966046 ]
Sreekanth Ramakrishnan commented on HIVE-1695: ---------------------------------------------- Group By operator Plan: {noformat} ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF test a) (TOK_TABREF test1 b) (= (. (TOK_TABLE_OR_COL a) key) (. (TOK_TABLE_OR_COL b) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_HINTLIST (TOK_HINT TOK_MAPJOIN (TOK_HINTARGLIST b))) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) key))) (TOK_GROUPBY (. (TOK_TABLE_OR_COL a) key)))) STAGE DEPENDENCIES: Stage-4 is a root stage Stage-1 depends on stages: Stage-4 Stage-0 is a root stage STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias -> Map Local Tables: b Fetch Operator limit: -1 Alias -> Map Local Operator Tree: b TableScan alias: b HashTable Sink Operator condition expressions: 0 {key} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] Position of Big Table: 0 Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: a TableScan alias: a Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0 Position of Big Table: 0 Reduce Output Operator key expressions: expr: _col0 type: int sort order: + Map-reduce partition columns: expr: _col0 type: int tag: -1 Local Work: Map Reduce Local Work Reduce Operator Tree: Group By Operator bucketGroup: false keys: expr: KEY._col0 type: int mode: mergepartial outputColumnNames: _col0 Select Operator expressions: expr: _col0 type: int outputColumnNames: _col0 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 {noformat} I have successfully got group by and order by working with new NodeProcessor Which I implemented, cross checked it with the results from before and after the plan alterations were done. > MapJoin followed by ReduceSink should be done as single MapReduce Job > --------------------------------------------------------------------- > > Key: HIVE-1695 > URL: https://issues.apache.org/jira/browse/HIVE-1695 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Reporter: Amareshwari Sriramadasu > > Currently MapJoin followed by ReduceSink runs as two MapReduce jobs : One map > only job followed by a Map-Reduce job. It can be combined into single > MapReduce Job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.