Hi,

John, are there plans or specific JIRA issues related to this particular 
performance hit that you or somebody else is working on and that those of us 
interested in performance improvements when Hive points to external tables in 
HBase should watch?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: John Sichi <jsi...@fb.com>
> To: "<user@hive.apache.org>" <user@hive.apache.org>
> Sent: Tue, March 8, 2011 1:17:51 AM
> Subject: Re: Performance between Hive queries vs. Hive over HBase queries
> 
> For native tables, Hive reads rows directly from HDFS.
> 
> For HBase tables,  it has to go through the HBase region servers, which 
>reconstruct rows from  column families (combining cache + HDFS).
> 
> HBase makes it possible to keep  your table up to date in real time, but you 
>have to pay an overhead cost at  query time.
> 
> On the other hand, with native Hive tables, there's latency  in loading new 
>batches of data.
> 
> JVS
> 
> On Mar 7, 2011, at 10:13 PM,  Biju Kaimal wrote:
> 
> > Hi,
> > 
> > Could you please explain the  reason for the behavior? 
> > 
> > Regards,
> > Biju
> > 
> > On Tue, Mar 8, 2011 at 11:35 AM, John Sichi <jsi...@fb.com>  wrote:
> > Yes.
> > 
> > JVS
> > 
> > On Mar 7, 2011, at  9:59 PM, Biju Kaimal wrote:
> > 
> > > Hi,
> > >
> > >  I loaded a data set which has 1 million rows into both Hive and HBase 
>tables.  For the HBase table, I created a corresponding Hive table so that the 
>data in  HBase can be queried from Hive QL. Both tables have a key column and 
>a 
>value  column
> > >
> > > For the same query (select value, count(*) from  table group by value), 
> > > the 
>Hive only query runs much faster (~ 30 seconds) as  compared to Hive over 
>HBase 
>(~ 150 seconds).
> > >
> > > Is this  expected?
> > >
> > > Regards,
> > > Biju
> > 
> > 
> 
> 

Reply via email to