[ https://issues.apache.org/jira/browse/HIVE-12435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009292#comment-15009292 ]
Takahiko Saito commented on HIVE-12435: --------------------------------------- Thanks for the info, [~gopalv]. That explains why the above passes with hive v 1.2. There is also another query which fails: I have the table: {noformat} select bool0, key, num0, num1 from calcs; NULL key09 NULL 12.4 true key10 NULL 10.32 false key11 NULL 2.47 NULL key12 NULL 12.05 NULL key13 NULL 10.37 true key14 NULL 7.1 false key15 NULL 16.81 NULL key16 NULL 7.12 true key00 12.3 8.42 false key01 -12.3 6.71 NULL key02 15.7 9.78 true key03 -15.7 7.43 false key04 3.5 9.05 NULL key05 -3.5 9.38 true key06 0.0 16.42 false key07 NULL 11.38 NULL key08 10.0 9.47 {noformat} When I run the following with vectorization enabled, it returns the following: {noformat} SELECT bool0, key, SUM((CASE WHEN bool0 THEN CAST(num0 AS DOUBLE) WHEN NOT bool0 THEN CAST(num1 AS DOUBLE) ELSE NULL END)) FROM Calcs GROUP BY bool0, key, num1; NULL key02 0.0 NULL key05 0.0 NULL key08 0.0 NULL key09 0.0 NULL key12 0.0 NULL key13 0.0 NULL key16 0.0 false key01 6.71 false key04 9.05 false key07 11.38 false key11 2.47 false key15 16.81 true key00 12.3 true key03 -15.7 true key06 0.0 true key10 NULL true key14 NULL {noformat} When vectorization is disabled, it returns NULL for the sum instead of 0 when bool0=NULL : {noformat} NULL key02 NULL NULL key05 NULL NULL key08 NULL NULL key09 NULL NULL key12 NULL NULL key13 NULL NULL key16 NULL false key01 6.71 false key04 9.05 false key07 11.38 false key11 2.47 false key15 16.81 true key00 12.3 true key03 -15.7 true key06 0.0 true key10 NULL true key14 NULL {noformat} Please let me know in a case if this needs to be addressed in another JIRA. > SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and > vectorization is enabled. > ------------------------------------------------------------------------------------------------------ > > Key: HIVE-12435 > URL: https://issues.apache.org/jira/browse/HIVE-12435 > Project: Hive > Issue Type: Bug > Components: Vectorization > Affects Versions: 2.0.0 > Reporter: Takahiko Saito > Assignee: Gopal V > > Run the following query: > {noformat} > create table count_case_groupby (key string, bool boolean) STORED AS orc; > insert into table count_case_groupby values ('key1', true),('key2', > false),('key3', NULL),('key4', false),('key5',NULL); > {noformat} > The table contains the following: > {noformat} > key1 true > key2 false > key3 NULL > key4 false > key5 NULL > {noformat} > The below query returns: > {noformat} > SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) > AS cnt_bool0_ok FROM count_case_groupby GROUP BY key; > key1 1 > key2 1 > key3 1 > key4 1 > key5 1 > {noformat} > while it expects the following results: > {noformat} > key1 1 > key2 1 > key3 0 > key4 1 > key5 0 > {noformat} > The query works with hive ver 1.2. Also it works when a table is not orc > format. > Also even if it's an orc table, when vectorization is disabled, the query > works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)