Yes bejoy i did it today and it's working. But i was thinking by setting some property we can achieve it. Is there anything like that?
Thanks Ashok From: Bejoy KS [mailto:bejoy...@yahoo.com] Sent: 13 September 2012 02:40 To: user@hive.apache.org Subject: Re: Performance: hive+hbase integration query against the row_key Hi Ashok 'LOAD DATA INPATH ..' issues a hdfs move under the hood, that is why the original data in hdfs is not present after the load operation. If you want to preserve the data in some hdfs location and use the same with hive, why not create an external table and point it to the required hdfs location. Regards, Bejoy KS ________________________________ From: "ashok.sa...@wipro.com" <ashok.sa...@wipro.com> To: user@hive.apache.org Sent: Wednesday, September 12, 2012 8:55 AM Subject: RE: Performance: hive+hbase integration query against the row_key after loading the data into hive tables, the files gets automatically deleted from HDFS...how to stop that? Thanks Ashok -----Original Message----- From: Alan Gates [mailto:ga...@hortonworks.com<mailto:ga...@hortonworks.com>] Sent: 12 September 2012 06:51 To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: Performance: hive+hbase integration query against the row_key On Sep 11, 2012, at 7:00 AM, bharath vissapragada wrote: > Hey, > > Hive does all kinds of parsing , metadata lookups, query tree building and > stuff before executing the query. Not sure if this all was included in those > 36 seconds ! > > Also what hive does is, it builds a scan object with ranges based on > predicates (and mappers too ) on key column and not a direct "get" call as in > hbase shell. This might incur some overhead too! Since Hive does this in a MapReduce job it definitely incurs overhead. It does not run directly against HBase as you might wish it did here. Alan. > > On Tue, Sep 11, 2012 at 7:10 PM, Shengjie Min > <kelvin....@gmail.com<mailto:kelvin....@gmail.com>> wrote: > Hi, > > I am trying to get hive working on top of my hbase table following the guide > below: > https://cwiki.apache.org/Hive/hbaseintegration.html > > CREATE EXTERNAL TABLE hive_hbase_test (key string, a string, b string, c > string) > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES > ("hbase.columns.mapping"=":key,cf:a,cf:b,cf:c") TBLPROPERTIES > ("hbase.table.name"="test"); > > this hive table creation makes my mapping roughly look like this: > > hive_hbase_test VS test > Hive key - hbase row_key > Hive column a - hbase cf:a > Hive column b - hbase cf:b > Hive column c - hbase cf:c > > From my understanding on how HBaseStorageHandler works, it's supposed to take > advantage of the hbase row_key index as much as possible. So I would expect, > > 1. if you do a hive query against the row key like "select * from > hive_hbase_test where key='blabla'", this would utilize the hbase row_key > index which give you very quick nearly real-time response just like hbase > does. > > 2. of coz, if you do a hive query against a column like "select * from > hive_hbase_test where a='blabla'", in this case, it queries against a > specific column, it probably uses mapred because there is nothing from Hbase > side can be utilized. > > From my test, query 1 doesn't seem fast at all, still taking ages, so > select * from hive_hbase_test where key='blabla' 36secs > vs > get 'test', 'blabla' less than 1 sec > still shows a huge difference. > > Anybody has tried this before? Is there anyway I can do sort of query plan > analysis against hive query? or I am not mapping hive table against hbase > table correctly? > > -- > All the best, > Shengjie Min > > > > > -- > Regards, > Bharath .V > w:http://researchweb.iiit.ac.in/~bharath.v The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com Please do not print this email unless it is absolutely necessary. The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com