Vineet Garg created HIVE-21387: ---------------------------------- Summary: Wrong result for UNION query with GROUP BY consisting of PK columns Key: HIVE-21387 URL: https://issues.apache.org/jira/browse/HIVE-21387 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 4.0.0 Reporter: Vineet Garg Assignee: Vineet Garg
*Reproducer* {code:sql} create table t1(i int primary key disable rely, j int); insert into t1 values(1,100),(2,200); create table t2(i int primary key disable rely, j int); insert into t2 values(2,1000),(4,500); select i from (select i, j from t1 union all select i,j from t2) subq group by i,j; {code} *Expected Result* {noformat} 2 2 4 1 {noformat} *Actual Result* {noformat} 1 2 4 {noformat} *CBO Plan* {code:sql} HiveAggregate(group=[{0}]) HiveProject(i=[$0], j=[$1]) HiveUnion(all=[true]) HiveProject(i=[$0], j=[$1]) HiveTableScan(table=[[default, t1]], table:alias=[t1]) HiveProject(i=[$0], j=[$1]) HiveTableScan(table=[[default, t2]], table:alias=[t2]) {code} This is due to Group by reduction logic reducing keys when it shouldn't. Because of UNION relative cardinality of the group by keys are changed (they are not PK/UNIQUE anymore). Therefore we shouldn't be trying to reduce group by keys at all. -- This message was sent by Atlassian JIRA (v7.6.3#76005)