Mala Chikka Kempanna created HIVE-7248:
------------------------------------------

             Summary: UNION ALL in hive returns incorrect results on Hbase 
backed table
                 Key: HIVE-7248
                 URL: https://issues.apache.org/jira/browse/HIVE-7248
             Project: Hive
          Issue Type: Bug
    Affects Versions: 0.13.1, 0.13.0, 0.12.0
            Reporter: Mala Chikka Kempanna


The issue can be recreated with following steps

1) In hbase 
create 'TABLE_EMP','default' 

2) On hive 
sudo -u hive hive 

CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME 
string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY 
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH 
SERDEPROPERTIES("hbase.columns.mapping" = 
"default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key", 
"hbase.scan.cache" = "500", "hbase.scan.cacheblocks" = "false" ) 
TBLPROPERTIES("hbase.table.name" = "TABLE_EMP",'serialization.null.format'=''); 


3) On hbase insert the following data 

put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini' 
put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P' 
put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 


put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind' 
put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K' 
put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 


4) On hive execute the following query 
hive 
SELECT * 
FROM ( 
SELECT CDS_PK 
FROM TABLE_EMP 
WHERE 
CDS_PK >= '0' 
AND CDS_PK <= '9' 
AND CDS_UPDATED_DATE IS NOT NULL 
UNION ALL SELECT CDS_PK 
FROM TABLE_EMP 
WHERE 
CDS_PK >= 'a' 
AND CDS_PK <= 'z' 
AND CDS_UPDATED_DATE IS NOT NULL 
)t ; 


5) Output of the query 

1 
1 
2 
2 

6) Output of just 

SELECT CDS_PK 
FROM TABLE_EMP 
WHERE 
CDS_PK >= '0' 
AND CDS_PK <= '9' 
AND CDS_UPDATED_DATE IS NOT NULL 

is 

1 
2 

7) Output of just 

SELECT CDS_PK 
FROM TABLE_EMP 
WHERE 
CDS_PK >= 'a' 
AND CDS_PK <= 'z' 
AND CDS_UPDATED_DATE IS NOT NULL 

Empty 

8) UNION is used to combine the result from multiple SELECT statements into a 
single result set. Hive currently only supports UNION ALL (bag union), in which 
duplicates are not eliminated 

Accordingly above query should return output 
1 
2 

instead it is giving wrong output 
1 
1 
2 
2



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to