I have an hbase table created as follows:
create 'event', {NAME => 'm', VERSIONS => 1}, {NAME => 'e', VERSIONS => 1}

I have a hive table mapped to the hbase table defined as follows:
CREATE EXTERNAL TABLE h_event(key 
struct<name:string,dateCreated:string,userId:string>, dummy string, dummy2 
string)
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '~'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,m:dum,e:dum2")
TBLPROPERTIES ("hbase.table.name" = "event");

(You may notice that this is as per the workaround suggested in 
https://issues.apache.org/jira/browse/HIVE-2599 by Swarnim Kulkarni)

I am loading data in the table from a text file by doing an 'INSERT OVERWRITE 
TABLE h_event ...' statement.
Data is loaded successfully without errors and a count 'event' done in hbase 
shell returns the correct number of expected records.

Environment:
I am on hive 0.8.1 and hbase 0.92.0, both running on Debian/Squeeze.

Problem:
"select count(*) from h_event where key.name='abc' " returns incorrect count 
and is always less than the actual number of records having key.name='abc' in 
the table.
"select count(*) , key.name from h_event group by key.name" also returns the 
wrong counts for various values of key.name and the subtotals returned by this 
query do not add up to the total number of records in the table.

"select * from h_event where key.name='abc' " returns the expected number of 
rows with the right keys and data.
"select count(*) from h_event" returns the correct number of rows in the table
"select count(key) from h_event" returns the wrong number, lesser than the 
actual rows

"select dummy from h_event where key.name='abc' " returns correct rows with 
right data
"select key.userId from h_event where key.name='abc' " returns wrong(lesser) 
number of rows

It seems to me that composite keys just don't seem to work in the expected 
manner; if you have any of the key columns in the select clause, the returned 
results are incomplete. Am I missing something obvious here, or is the only 
solution to go back to duplicating key columns again in the table ?

Any help is greatly appreciated.

Thanks
Rupinder



This email is intended for the person(s) to whom it is addressed and may 
contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized use, 
distribution, copying, or disclosure by any person other than the addressee(s) 
is strictly prohibited. If you have received this email in error, please notify 
the sender immediately by return email and delete the message and any 
attachments from your system.

Reply via email to