RE: HIVE vs HBASE for Datawarehousing

Manish . Bhoge Wed, 22 Feb 2012 22:09:18 -0800

Shiv,

Both Hbase and Hive alone is not perfect fit for Datawarehouse application. We 
have to use Hive and Hbase to feed into traditional datawarehouse application.


Hbase: Hbase is more update oriented. Insert / Update / Delete operation is 
efficient in Hbase. But it doesn't perform well as Hive on frequent read.
Hive: Hive however very good interface with HiveQL.  Having SQL like interface 
make the data retrieval easy as well as efficient too. Hbase is NoSQL database.

Once you receive the data from outer world, you can use Hbase as a datastore to 
feed into your datawarehouse and Hive can play as a interface to retrieve the 
data for analysis.

There is one more aspect you can consider here is the use of PIG script which 
is proved as a very good analysis tool. Here you don't need to maintain the 
schema and still you can write a code like a SQL script.


PS: search the apache repository for HBase and Hive interface to see how both 
can talk together.

Thank You,
Manish

From: Shiv Sharma [mailto:aatman.eq.brah...@gmail.com]
Sent: Wednesday, February 22, 2012 11:06 PM
To: user@hive.apache.org
Subject: HIVE vs HBASE for Datawarehousing

4 Newbie questions:

1. Assuming we are ok with non-SQL access, would HBASE  work as a store for a 
datawarehouse?

      Basically, why HIVE for a warehouse? Why not HBASE? I understand the SQL 
interface to HIVE, but are there other reasons?

2. How is the HBASE data model different from Hive?

BigTable has this wiki description
sparse, distributed multi-dimensional sorted map

I could not find the corresponding description for HBASE, but I assume this is 
true for HBASE as well.

So 2.1  Is the BigTable description true for HBASE as well ?
     2.2  What is the corresponding description for HIVE?

3) ETL in HIVE

 One typical pattern in traditional ETL is :
    -- for dimension element in fact stream, lookup dimension to see if 
dimension value exists
         if exists, get the dimension key
         if not , insert new  dimension value and use this (new) value for the 
current record

  3.1 Can this be achieved in HIVE?
  3.2 Can it be done in HIVE-SQL?


4)  (More ETL)
I often find myself updating tables to add more context from "later arriving 
data". This takes the form of updating columns in dimension tables,
or updating an aggregate table and such.

4.1 Can this be achieved in HIVE?
4.2 Can it be done in HIVE-SQL?

Thank you,
Shiv

RE: HIVE vs HBASE for Datawarehousing

Reply via email to