Hi,

Biju's example shows a factor of 5 decrease in performance when Hive points to 
HBase tables.

Does anyone know how much this factor varies?  Is if often closer to 1 or is is 
more often close to 10?
Just trying to get a better feel for this...

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: John Sichi <jsi...@fb.com>
> To: "<user@hive.apache.org>" <user@hive.apache.org>
> Sent: Tue, March 8, 2011 1:05:34 AM
> Subject: Re: Performance between Hive queries vs. Hive over HBase queries
> 
> Yes.
> 
> JVS
> 
> On Mar 7, 2011, at 9:59 PM, Biju Kaimal  wrote:
> 
> > Hi,
> > 
> > I loaded a data set which has 1 million  rows into both Hive and HBase 
>tables. For the HBase table, I created a  corresponding Hive table so that the 
>data in HBase can be queried from Hive QL.  Both tables have a key column and 
>a 
>value column
> > 
> > For the same  query (select value, count(*) from table group by value), the 
>Hive only query  runs much faster (~ 30 seconds) as compared to Hive over 
>HBase 
>(~ 150  seconds).
> > 
> > Is this expected?
> > 
> > Regards,
> >  Biju
> 
> 

Reply via email to