On Wed, Mar 9, 2011 at 4:31 PM, John Sichi <jsi...@fb.com> wrote:
> Factor of 5 closely matches the results I got when I was testing.
>
> JVS
>
> On Mar 9, 2011, at 1:23 PM, Otis Gospodnetic wrote:
>
>> Hi,
>>
>> Biju's example shows a factor of 5 decrease in performance when Hive points
>> to
>> HBase tables.
>>
>> Does anyone know how much this factor varies? Is if often closer to 1 or is
>> is
>> more often close to 10?
>> Just trying to get a better feel for this...
>>
>> Thanks,
>> Otis
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Lucene ecosystem search :: http://search-lucene.com/
>>
>>
>>
>> ----- Original Message ----
>>> From: John Sichi <jsi...@fb.com>
>>> To: "<user@hive.apache.org>" <user@hive.apache.org>
>>> Sent: Tue, March 8, 2011 1:05:34 AM
>>> Subject: Re: Performance between Hive queries vs. Hive over HBase queries
>>>
>>> Yes.
>>>
>>> JVS
>>>
>>> On Mar 7, 2011, at 9:59 PM, Biju Kaimal wrote:
>>>
>>>> Hi,
>>>>
>>>> I loaded a data set which has 1 million rows into both Hive and HBase
>>> tables. For the HBase table, I created a corresponding Hive table so that
>>> the
>>> data in HBase can be queried from Hive QL. Both tables have a key column
>>> and a
>>> value column
>>>>
>>>> For the same query (select value, count(*) from table group by value), the
>>> Hive only query runs much faster (~ 30 seconds) as compared to Hive over
>>> HBase
>>> (~ 150 seconds).
>>>>
>>>> Is this expected?
>>>>
>>>> Regards,
>>>> Biju
>>>
>>>
>
>
There is going to be overhead. Data has to move
HDFS->RegionServer->TaskTracker. Another factor would be how many
column families are being spanned in your table search.