That depends on how dynamic your data is. If it is pretty static, you can also consider using something like Create Table As Select (CTAS) to create a snapshot of your data to HDFS and then run queries on top of that data.
So your query might become something like: create table my_table as select * from event where key.name=’Signup’ and key.dateCreated=’2013-03-06 16:39:55.353’ and key.uid=’7af4c330-5988-4255- 9250-924ce5864e3bf’; Since your data is now in HDFS, this should give you a considerable performance boost. On Tue, Apr 30, 2013 at 3:00 PM, Rupinder Singh <rsi...@care.com> wrote: > Swarnim,**** > > ** ** > > Thanks. So this means custom map reduce is the viable option when working > with hbase tables having composite keys, since it allows to set the start > and stop keys. Hive+Hbase combination is out.**** > > ** ** > > Regards**** > > Rupinder**** > > ** ** > > *From:* kulkarni.swar...@gmail.com [mailto:kulkarni.swar...@gmail.com] > *Sent:* Wednesday, May 01, 2013 12:17 AM > > *To:* user@hive.apache.org > *Cc:* u...@hbase.apache.org > *Subject:* Re: Very poor read performance with composite keys in hbase**** > > ** ** > > Rupinder,**** > > ** ** > > Hive supports a filter pushdown[1] which means that the predicates in the > where clause are pushed down to the storage handler level where either they > get handled by the storage handler or delegated to hive if they cannot > handle them. As of now, the HBaseStorageHandler only supports primitive > types. So when you use strings as keys, behind the scenes they get > converted to start and stop keys and restrict the hbase scan. This does not > happen for structs. Hence you see a full table scan causing bad performance. > **** > > ** ** > > [1] https://cwiki.apache.org/Hive/filterpushdowndev.html**** > > ** ** > > On Tue, Apr 30, 2013 at 1:04 PM, Sanjay Subramanian < > sanjay.subraman...@wizecommerce.com> wrote:**** > > My experience with hive + hbase has been about 8x slower on an average. So > I went ahead with hive only option. > > Sent from my iPhone**** > > > On Apr 30, 2013, at 11:19 PM, "Rupinder Singh" <rsi...@care.com> wrote:*** > * > > Hi,**** > > **** > > I have an hbase cluster where I have a table with a composite key. I map > this table to a Hive external table using which I insert/select data > into/from this table:**** > > CREATE EXTERNAL TABLE event(key > struct<name:string,dateCreated:string,uid:string>, {more columns here})*** > * > > ROW FORMAT DELIMITED**** > > COLLECTION ITEMS TERMINATED BY '~'**** > > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'**** > > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, other columns ")*** > * > > TBLPROPERTIES ("hbase.table.name" = "event");**** > > **** > > The table has about 10 million rows. When I do a select * using all 3 > components of the key, essentially selecting just 1 row, the response time > is almost 700 sec, which seems pretty bad.**** > > **** > > For comparison purpose, I created another table with a simple string key, > and the rest of the columns etc same. The key is a string UUID. Table has > same number of column families and same number of rows.**** > > CREATE EXTERNAL TABLE test_event(key string, blah blah…..**** > > TBLPROPERTIES ("hbase.table.name" = "test_event");**** > > **** > > When I select a single row from this table by doing select * where > key=’something’, the response time is 35 sec.**** > > **** > > This seems to indicate that in case of composite keys, there is a full > table scan happening. This seems weird.**** > > **** > > What am I missing here? Is there something special I need to do to get > good read performance if I am using composite keys ?**** > > Insert performance in both cases is comparable and is as per expectation.* > *** > > **** > > Any help is appreciated.**** > > Here is the env spec:**** > > **** > > Amazon EMR**** > > Hbase Cluster- 3 core nodes with 7.5 GB RAM each, 2 CPUs of 2.2 GHz each. > Master 7.5 GB RAM, 2 CPUs of 2.2 GHz each**** > > Hive Cluster – 3 core nodes 3.75 GB RAM each, 1 CPU of 1.8 GHz. Master > 3.75 GB RAM, 1 CPU of 1.8 GHz**** > > **** > > Thanks**** > > Rupinder**** > > ** ** > > ** ** > > This email is intended for the person(s) to whom it is addressed and may > contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized > use, distribution, copying, or disclosure by any person other than the > addressee(s) is strictly prohibited. If you have received this email in > error, please notify the sender immediately by return email and delete the > message and any attachments from your system.**** > > ** ** > > ** ** > > CONFIDENTIALITY NOTICE > ====================== > This email message and any attachments are for the exclusive use of the > intended recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message along > with any attachments, from your computer system. If you are the intended > recipient, please be advised that the content of this message is subject to > access, review and disclosure by the sender's Email System Administrator.* > *** > > > > **** > > ** ** > > -- > Swarnim **** > -- Swarnim