Factor of 5 closely matches the results I got when I was testing. JVS
On Mar 9, 2011, at 1:23 PM, Otis Gospodnetic wrote: > Hi, > > Biju's example shows a factor of 5 decrease in performance when Hive points > to > HBase tables. > > Does anyone know how much this factor varies? Is if often closer to 1 or is > is > more often close to 10? > Just trying to get a better feel for this... > > Thanks, > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > ----- Original Message ---- >> From: John Sichi <jsi...@fb.com> >> To: "<user@hive.apache.org>" <user@hive.apache.org> >> Sent: Tue, March 8, 2011 1:05:34 AM >> Subject: Re: Performance between Hive queries vs. Hive over HBase queries >> >> Yes. >> >> JVS >> >> On Mar 7, 2011, at 9:59 PM, Biju Kaimal wrote: >> >>> Hi, >>> >>> I loaded a data set which has 1 million rows into both Hive and HBase >> tables. For the HBase table, I created a corresponding Hive table so that >> the >> data in HBase can be queried from Hive QL. Both tables have a key column >> and a >> value column >>> >>> For the same query (select value, count(*) from table group by value), the >> Hive only query runs much faster (~ 30 seconds) as compared to Hive over >> HBase >> (~ 150 seconds). >>> >>> Is this expected? >>> >>> Regards, >>> Biju >> >>