Mala Chikka Kempanna created HIVE-7248: ------------------------------------------
Summary: UNION ALL in hive returns incorrect results on Hbase backed table Key: HIVE-7248 URL: https://issues.apache.org/jira/browse/HIVE-7248 Project: Hive Issue Type: Bug Affects Versions: 0.13.1, 0.13.0, 0.12.0 Reporter: Mala Chikka Kempanna The issue can be recreated with following steps 1) In hbase create 'TABLE_EMP','default' 2) On hive sudo -u hive hive CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES("hbase.columns.mapping" = "default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key", "hbase.scan.cache" = "500", "hbase.scan.cacheblocks" = "false" ) TBLPROPERTIES("hbase.table.name" = "TABLE_EMP",'serialization.null.format'=''); 3) On hbase insert the following data put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini' put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P' put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind' put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K' put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 4) On hive execute the following query hive SELECT * FROM ( SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK >= '0' AND CDS_PK <= '9' AND CDS_UPDATED_DATE IS NOT NULL UNION ALL SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK >= 'a' AND CDS_PK <= 'z' AND CDS_UPDATED_DATE IS NOT NULL )t ; 5) Output of the query 1 1 2 2 6) Output of just SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK >= '0' AND CDS_PK <= '9' AND CDS_UPDATED_DATE IS NOT NULL is 1 2 7) Output of just SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK >= 'a' AND CDS_PK <= 'z' AND CDS_UPDATED_DATE IS NOT NULL Empty 8) UNION is used to combine the result from multiple SELECT statements into a single result set. Hive currently only supports UNION ALL (bag union), in which duplicates are not eliminated Accordingly above query should return output 1 2 instead it is giving wrong output 1 1 2 2 -- This message was sent by Atlassian JIRA (v6.2#6252)