Swarnim,

Thanks. So this means custom map reduce is the viable option when working with 
hbase tables having composite keys, since it allows to set the start and stop 
keys. Hive+Hbase combination is out.

Regards
Rupinder

From: kulkarni.swar...@gmail.com [mailto:kulkarni.swar...@gmail.com]
Sent: Wednesday, May 01, 2013 12:17 AM
To: user@hive.apache.org
Cc: u...@hbase.apache.org
Subject: Re: Very poor read performance with composite keys in hbase

Rupinder,

Hive supports a filter pushdown[1] which means that the predicates in the where 
clause are pushed down to the storage handler level where either they get 
handled by the storage handler or delegated to hive if they cannot handle them. 
As of now, the HBaseStorageHandler only supports primitive types. So when you 
use strings as keys, behind the scenes they get converted to start and stop 
keys and restrict the hbase scan. This does not happen for structs. Hence you 
see a full table scan causing bad performance.

[1] https://cwiki.apache.org/Hive/filterpushdowndev.html

On Tue, Apr 30, 2013 at 1:04 PM, Sanjay Subramanian 
<sanjay.subraman...@wizecommerce.com<mailto:sanjay.subraman...@wizecommerce.com>>
 wrote:
My experience with hive + hbase has been about 8x slower on an average. So I 
went ahead with hive only option.

Sent from my iPhone

On Apr 30, 2013, at 11:19 PM, "Rupinder Singh" 
<rsi...@care.com<mailto:rsi...@care.com>> wrote:
Hi,

I have an hbase cluster where I have a table with a composite key. I map this 
table to a Hive external table using which I insert/select data into/from this 
table:
CREATE EXTERNAL TABLE event(key 
struct<name:string,dateCreated:string,uid:string>, {more columns here})
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '~'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, other columns ")
TBLPROPERTIES ("hbase.table.name<http://hbase.table.name>" = "event");

The table has about 10 million rows. When I do a select * using all 3 
components of the key, essentially selecting just 1 row, the response time is 
almost 700 sec, which seems pretty bad.

For comparison purpose, I created another table with a simple string key, and 
the rest of the columns etc same. The key is a string UUID. Table has same 
number of column families and same number of rows.
CREATE EXTERNAL TABLE test_event(key string, blah blah.....
TBLPROPERTIES ("hbase.table.name<http://hbase.table.name>" = "test_event");

When I select a single row from this table by doing select * where 
key='something', the response time is 35 sec.

This seems to indicate that in case of composite keys, there is a full table 
scan happening.  This seems weird.

What am I missing here? Is there something special I need to do to get good 
read performance if I am using composite keys ?
Insert performance in both cases is comparable and is as per expectation.

Any help is appreciated.
Here is the env spec:

Amazon EMR
Hbase Cluster- 3 core nodes with 7.5 GB RAM each, 2 CPUs of 2.2 GHz each. 
Master 7.5 GB RAM, 2 CPUs of 2.2 GHz each
Hive Cluster - 3 core nodes 3.75 GB RAM each, 1 CPU of 1.8 GHz. Master 3.75 GB 
RAM, 1 CPU of 1.8 GHz

Thanks
Rupinder



This email is intended for the person(s) to whom it is addressed and may 
contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized use, 
distribution, copying, or disclosure by any person other than the addressee(s) 
is strictly prohibited. If you have received this email in error, please notify 
the sender immediately by return email and delete the message and any 
attachments from your system.


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.



--
Swarnim

Reply via email to