Vineet Garg created HIVE-14442: ---------------------------------- Summary: CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false Key: HIVE-14442 URL: https://issues.apache.org/jira/browse/HIVE-14442 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Vineet Garg Assignee: Vineet Garg
Reproducer {code} set hive.cbo.returnpath.hiveop=true {code} {code} set hive.map.aggr=false {code} {code} create table abcd (a int, b int, c int, d int); LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd; {code} {code} explain select count(distinct a) from abcd group by b; {code} {code} STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: abcd Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: a (type: int) outputColumnNames: a Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: a (type: int), a (type: int) sort order: ++ Map-reduce partition columns: a (type: int) Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Group By Operator aggregations: count(DISTINCT KEY._col1:0._col0) keys: KEY._col0 (type: int) mode: complete outputColumnNames: b, $f1 Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: $f1 (type: bigint) outputColumnNames: _o__c0 Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe {code} explain select count(distinct a) from abcd group by c; {code} {code} STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: abcd Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: a (type: int) outputColumnNames: a Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: a (type: int), a (type: int) sort order: ++ Map-reduce partition columns: a (type: int) Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Group By Operator aggregations: count(DISTINCT KEY._col1:0._col0) keys: KEY._col0 (type: int) mode: complete outputColumnNames: c, $f1 Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: $f1 (type: bigint) outputColumnNames: _o__c0 Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe {code} Above two cases has wrong keys in Map side Reduce Output Operator (both has a, a instead of b,a and c,a respectively -- This message was sent by Atlassian JIRA (v6.3.4#6332)