Currently, hive storage handler reads rows one by one. https://issues.apache.org/jira/browse/HIVE-3603 is for setting cache size, which is not yet fixed.
2013/5/1 kulkarni.swar...@gmail.com <kulkarni.swar...@gmail.com>: > That depends on how dynamic your data is. If it is pretty static, you can > also consider using something like Create Table As Select (CTAS) to create a > snapshot of your data to HDFS and then run queries on top of that data. > > So your query might become something like: > > create table my_table as select * from event where key.name=’Signup’ and > key.dateCreated=’2013-03-06 16:39:55.353’ and > key.uid=’7af4c330-5988-4255-9250-924ce5864e3bf’; > > Since your data is now in HDFS, this should give you a considerable > performance boost. > > > On Tue, Apr 30, 2013 at 3:00 PM, Rupinder Singh <rsi...@care.com> wrote: >> >> Swarnim, >> >> >> >> Thanks. So this means custom map reduce is the viable option when working >> with hbase tables having composite keys, since it allows to set the start >> and stop keys. Hive+Hbase combination is out. >> >> >> >> Regards >> >> Rupinder >> >> >> >> From: kulkarni.swar...@gmail.com [mailto:kulkarni.swar...@gmail.com] >> Sent: Wednesday, May 01, 2013 12:17 AM >> >> >> To: user@hive.apache.org >> Cc: u...@hbase.apache.org >> Subject: Re: Very poor read performance with composite keys in hbase >> >> >> >> Rupinder, >> >> >> >> Hive supports a filter pushdown[1] which means that the predicates in the >> where clause are pushed down to the storage handler level where either they >> get handled by the storage handler or delegated to hive if they cannot >> handle them. As of now, the HBaseStorageHandler only supports primitive >> types. So when you use strings as keys, behind the scenes they get converted >> to start and stop keys and restrict the hbase scan. This does not happen for >> structs. Hence you see a full table scan causing bad performance. >> >> >> >> [1] https://cwiki.apache.org/Hive/filterpushdowndev.html >> >> >> >> On Tue, Apr 30, 2013 at 1:04 PM, Sanjay Subramanian >> <sanjay.subraman...@wizecommerce.com> wrote: >> >> My experience with hive + hbase has been about 8x slower on an average. So >> I went ahead with hive only option. >> >> Sent from my iPhone >> >> >> On Apr 30, 2013, at 11:19 PM, "Rupinder Singh" <rsi...@care.com> wrote: >> >> Hi, >> >> >> >> I have an hbase cluster where I have a table with a composite key. I map >> this table to a Hive external table using which I insert/select data >> into/from this table: >> >> CREATE EXTERNAL TABLE event(key >> struct<name:string,dateCreated:string,uid:string>, {more columns here}) >> >> ROW FORMAT DELIMITED >> >> COLLECTION ITEMS TERMINATED BY '~' >> >> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' >> >> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, other columns ") >> >> TBLPROPERTIES ("hbase.table.name" = "event"); >> >> >> >> The table has about 10 million rows. When I do a select * using all 3 >> components of the key, essentially selecting just 1 row, the response time >> is almost 700 sec, which seems pretty bad. >> >> >> >> For comparison purpose, I created another table with a simple string key, >> and the rest of the columns etc same. The key is a string UUID. Table has >> same number of column families and same number of rows. >> >> CREATE EXTERNAL TABLE test_event(key string, blah blah….. >> >> TBLPROPERTIES ("hbase.table.name" = "test_event"); >> >> >> >> When I select a single row from this table by doing select * where >> key=’something’, the response time is 35 sec. >> >> >> >> This seems to indicate that in case of composite keys, there is a full >> table scan happening. This seems weird. >> >> >> >> What am I missing here? Is there something special I need to do to get >> good read performance if I am using composite keys ? >> >> Insert performance in both cases is comparable and is as per expectation. >> >> >> >> Any help is appreciated. >> >> Here is the env spec: >> >> >> >> Amazon EMR >> >> Hbase Cluster- 3 core nodes with 7.5 GB RAM each, 2 CPUs of 2.2 GHz each. >> Master 7.5 GB RAM, 2 CPUs of 2.2 GHz each >> >> Hive Cluster – 3 core nodes 3.75 GB RAM each, 1 CPU of 1.8 GHz. Master >> 3.75 GB RAM, 1 CPU of 1.8 GHz >> >> >> >> Thanks >> >> Rupinder >> >> >> >> >> >> This email is intended for the person(s) to whom it is addressed and may >> contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized >> use, distribution, copying, or disclosure by any person other than the >> addressee(s) is strictly prohibited. If you have received this email in >> error, please notify the sender immediately by return email and delete the >> message and any attachments from your system. >> >> >> >> >> >> CONFIDENTIALITY NOTICE >> ====================== >> This email message and any attachments are for the exclusive use of the >> intended recipient(s) and may contain confidential and privileged >> information. Any unauthorized review, use, disclosure or distribution is >> prohibited. If you are not the intended recipient, please contact the sender >> by reply email and destroy all copies of the original message along with any >> attachments, from your computer system. If you are the intended recipient, >> please be advised that the content of this message is subject to access, >> review and disclosure by the sender's Email System Administrator. >> >> >> >> >> >> -- >> Swarnim > > > > > -- > Swarnim