Song Jun created HIVE-19294:
-------------------------------

             Summary: grouping sets when contains a constant column
                 Key: HIVE-19294
                 URL: https://issues.apache.org/jira/browse/HIVE-19294
             Project: Hive
          Issue Type: Bug
          Components: CBO
    Affects Versions: 2.3.2
            Reporter: Song Jun


We have different results between Hive-1.2.2 and Hive-2.3.2, SQL like this:
{code:java}
select 
    case when a='all' then 'x' 
         when b=1 then 'y' 
         else 'z' 
    end, c 
from ( 
    select 
         a,b,count(1) as c
    from ( 
         select 
            'all' as a,b 
         from test 
    ) t1 group by a,b grouping sets(a,b) 
) t2;
{code}
We have a grouping sets using the column a which is a contant value 'all' in 
its subquery.

 

The result of Hive 1.2.2(same result when set hive.cbo.enable to true or false):
{code:java}
x 3
y 2
z 1   {code}
The result of Hive 2.3.2(same result when set hive.cbo.enable to true or false):
{code:java}
x 3
x 2
x 1{code}
I dig it out for Hive 2.3.2 and set hive.cbo.enable=false, I found it that the 
optimizer 

ConstantPropagate optimize the code according to the constant column value 
'all' in the subquery:
{code:java}
case when a='all' then 'x' 
         when b=1 then 'y' 
         else 'z' 
end
{code}
to 
{code:java}
Select Operator
expressions: CASE WHEN (true) THEN ('x') WHEN ((_col1 = 1)) THEN ('y') ELSE 
('z') END (type: string), _col3 (type: bigint)
outputColumnNames: _col0, _col1
Statistics: Num rows: 3 Data size: 3 Basic stats: COMPLETE Column stats: NONE
{code}
That is case when a = 'all'  explained as case when (true), so we always has 
the value of 'x'.

 

So, which should be right for the above query case? 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to