Song Jun created HIVE-19294: ------------------------------- Summary: grouping sets when contains a constant column Key: HIVE-19294 URL: https://issues.apache.org/jira/browse/HIVE-19294 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 2.3.2 Reporter: Song Jun
We have different results between Hive-1.2.2 and Hive-2.3.2, SQL like this: {code:java} select case when a='all' then 'x' when b=1 then 'y' else 'z' end, c from ( select a,b,count(1) as c from ( select 'all' as a,b from test ) t1 group by a,b grouping sets(a,b) ) t2; {code} We have a grouping sets using the column a which is a contant value 'all' in its subquery. The result of Hive 1.2.2(same result when set hive.cbo.enable to true or false): {code:java} x 3 y 2 z 1 {code} The result of Hive 2.3.2(same result when set hive.cbo.enable to true or false): {code:java} x 3 x 2 x 1{code} I dig it out for Hive 2.3.2 and set hive.cbo.enable=false, I found it that the optimizer ConstantPropagate optimize the code according to the constant column value 'all' in the subquery: {code:java} case when a='all' then 'x' when b=1 then 'y' else 'z' end {code} to {code:java} Select Operator expressions: CASE WHEN (true) THEN ('x') WHEN ((_col1 = 1)) THEN ('y') ELSE ('z') END (type: string), _col3 (type: bigint) outputColumnNames: _col0, _col1 Statistics: Num rows: 3 Data size: 3 Basic stats: COMPLETE Column stats: NONE {code} That is case when a = 'all' explained as case when (true), so we always has the value of 'x'. So, which should be right for the above query case? -- This message was sent by Atlassian JIRA (v7.6.3#76005)