Aleksey Vovchenko created HIVE-16741:
----------------------------------------
Summary: Counting number of records in hive and hbase are
different for NULL fields in hive
Key: HIVE-16741
URL: https://issues.apache.org/jira/browse/HIVE-16741
Project: Hive
Issue Type: Bug
Components: Hive
Affects Versions: 2.1.0, 1.2.0
Reporter: Aleksey Vovchenko
Assignee: Aleksey Vovchenko
Steps to reproduce:
STEP 1.
hbase> create 'testTable',{NAME=>'cf'}
STEP 2.
put 'testTable','10','cf:Address','My Address 411002'
put 'testTable','10','cf:contactId','653638'
put 'testTable','10','cf:currentStatus','Awaiting'
put 'testTable','10','cf:createdAt','1452815193'
put 'testTable','10','cf:Id','10'
put 'testTable','15','cf:contactId','653638'
put 'testTable','15','cf:currentStatus','Awaiting'
put 'testTable','15','cf:createdAt','1452815193'
put 'testTable','15','cf:Id','15'
(Note: Here Addrees column is not provided.It means that NULL.)
put 'testTable','20','cf:Address','My Address 411003'
put 'testTable','20','cf:contactId','653638'
put 'testTable','20','cf:currentStatus','Awaiting'
put 'testTable','20','cf:createdAt','1452815193'
put 'testTable','20','cf:Id','20'
put 'testTable','17','cf:Address','My Address 411003'
put 'testTable','17','cf:currentStatus','Awaiting'
put 'testTable','17','cf:createdAt','1452815193'
put 'testTable','17','cf:Id','17'
STEP 3.
hive> CREATE external TABLE hh_testTable(Id string,Address string,contactId
string,currentStatus string,createdAt string) STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES
("hbase.columns.mapping"=":key,cf:Address,cf:contactId,cf:currentStatus,cf:createdAt")
TBLPROPERTIES ("hbase.table.name"="testTable");
STEP 4.
hive> select count(*),contactid from hh_testTable group by contactid;
Actual result:
OK
3 653638
Expected result:
OK
1 NULL
3 653637
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)