Shiv, Both Hbase and Hive alone is not perfect fit for Datawarehouse application. We have to use Hive and Hbase to feed into traditional datawarehouse application.
Hbase: Hbase is more update oriented. Insert / Update / Delete operation is efficient in Hbase. But it doesn't perform well as Hive on frequent read. Hive: Hive however very good interface with HiveQL. Having SQL like interface make the data retrieval easy as well as efficient too. Hbase is NoSQL database. Once you receive the data from outer world, you can use Hbase as a datastore to feed into your datawarehouse and Hive can play as a interface to retrieve the data for analysis. There is one more aspect you can consider here is the use of PIG script which is proved as a very good analysis tool. Here you don't need to maintain the schema and still you can write a code like a SQL script. PS: search the apache repository for HBase and Hive interface to see how both can talk together. Thank You, Manish From: Shiv Sharma [mailto:aatman.eq.brah...@gmail.com] Sent: Wednesday, February 22, 2012 11:06 PM To: user@hive.apache.org Subject: HIVE vs HBASE for Datawarehousing 4 Newbie questions: 1. Assuming we are ok with non-SQL access, would HBASE work as a store for a datawarehouse? Basically, why HIVE for a warehouse? Why not HBASE? I understand the SQL interface to HIVE, but are there other reasons? 2. How is the HBASE data model different from Hive? BigTable has this wiki description sparse, distributed multi-dimensional sorted map I could not find the corresponding description for HBASE, but I assume this is true for HBASE as well. So 2.1 Is the BigTable description true for HBASE as well ? 2.2 What is the corresponding description for HIVE? 3) ETL in HIVE One typical pattern in traditional ETL is : -- for dimension element in fact stream, lookup dimension to see if dimension value exists if exists, get the dimension key if not , insert new dimension value and use this (new) value for the current record 3.1 Can this be achieved in HIVE? 3.2 Can it be done in HIVE-SQL? 4) (More ETL) I often find myself updating tables to add more context from "later arriving data". This takes the form of updating columns in dimension tables, or updating an aggregate table and such. 4.1 Can this be achieved in HIVE? 4.2 Can it be done in HIVE-SQL? Thank you, Shiv