[ 
https://issues.apache.org/jira/browse/HIVE-12435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009292#comment-15009292
 ] 

Takahiko Saito commented on HIVE-12435:
---------------------------------------

Thanks for the info, [~gopalv]. That explains why the above passes with hive v 
1.2.

There is also another query which fails:
I have the table:
{noformat}
select bool0, key, num0, num1 from calcs;
NULL    key09   NULL    12.4
true    key10   NULL    10.32
false   key11   NULL    2.47
NULL    key12   NULL    12.05
NULL    key13   NULL    10.37
true    key14   NULL    7.1
false   key15   NULL    16.81
NULL    key16   NULL    7.12
true    key00   12.3    8.42
false   key01   -12.3   6.71
NULL    key02   15.7    9.78
true    key03   -15.7   7.43
false   key04   3.5     9.05
NULL    key05   -3.5    9.38
true    key06   0.0     16.42
false   key07   NULL    11.38
NULL    key08   10.0    9.47
{noformat}

When I run the following with vectorization enabled, it returns the following:
{noformat}
SELECT bool0, key, SUM((CASE WHEN bool0 THEN CAST(num0 AS DOUBLE) WHEN NOT 
bool0 THEN CAST(num1 AS DOUBLE) ELSE NULL END)) FROM Calcs GROUP BY bool0, key, 
num1;
NULL     key02  0.0
NULL key05      0.0
NULL     key08  0.0
NULL     key09  0.0
NULL     key12  0.0
NULL     key13  0.0
NULL     key16  0.0
false    key01  6.71
false    key04  9.05
false    key07  11.38
false    key11  2.47
false    key15  16.81
true    key00   12.3
true    key03   -15.7
true    key06   0.0
true    key10   NULL
true    key14   NULL
{noformat}

When vectorization is disabled, it returns NULL for the sum instead of 0 when 
bool0=NULL :
{noformat}
NULL     key02  NULL
NULL     key05  NULL
NULL     key08  NULL
NULL     key09  NULL
NULL     key12  NULL
NULL     key13  NULL
NULL     key16  NULL
false    key01  6.71
false    key04  9.05
false    key07  11.38
false    key11  2.47
false    key15  16.81
true     key00  12.3
true     key03  -15.7
true     key06  0.0
true     key10  NULL
true     key14  NULL
{noformat}

Please let me know in a case if this needs to be addressed in another JIRA.

> SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and 
> vectorization is enabled.
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-12435
>                 URL: https://issues.apache.org/jira/browse/HIVE-12435
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 2.0.0
>            Reporter: Takahiko Saito
>            Assignee: Gopal V
>
> Run the following query:
> {noformat}
> create table count_case_groupby (key string, bool boolean) STORED AS orc;
> insert into table count_case_groupby values ('key1', true),('key2', 
> false),('key3', NULL),('key4', false),('key5',NULL);
> {noformat}
> The table contains the following:
> {noformat}
> key1  true
> key2  false
> key3  NULL
> key4  false
> key5  NULL
> {noformat}
> The below query returns:
> {noformat}
> SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) 
> AS cnt_bool0_ok FROM count_case_groupby GROUP BY key;
> key1  1
> key2  1
> key3  1
> key4  1
> key5  1
> {noformat}
> while it expects the following results:
> {noformat}
> key1  1
> key2  1
> key3  0
> key4  1
> key5  0
> {noformat}
> The query works with hive ver 1.2. Also it works when a table is not orc 
> format.
> Also even if it's an orc table, when vectorization is disabled, the query 
> works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to