Hi, Biju's example shows a factor of 5 decrease in performance when Hive points to HBase tables.
Does anyone know how much this factor varies? Is if often closer to 1 or is is more often close to 10? Just trying to get a better feel for this... Thanks, Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ ----- Original Message ---- > From: John Sichi <jsi...@fb.com> > To: "<user@hive.apache.org>" <user@hive.apache.org> > Sent: Tue, March 8, 2011 1:05:34 AM > Subject: Re: Performance between Hive queries vs. Hive over HBase queries > > Yes. > > JVS > > On Mar 7, 2011, at 9:59 PM, Biju Kaimal wrote: > > > Hi, > > > > I loaded a data set which has 1 million rows into both Hive and HBase >tables. For the HBase table, I created a corresponding Hive table so that the >data in HBase can be queried from Hive QL. Both tables have a key column and >a >value column > > > > For the same query (select value, count(*) from table group by value), the >Hive only query runs much faster (~ 30 seconds) as compared to Hive over >HBase >(~ 150 seconds). > > > > Is this expected? > > > > Regards, > > Biju > >