HDFS lacks random read and write accees. This is where HBase comes into
picture.It stores data as key/value pairs.
Hive provides  data warehousing facilities on top of an existing Hadoop
cluster. It provides an SQL like interface which makes your work easier.
You can create tables in Hive and store data there. Along with that you can
even map your existing HBase tables to Hive and operate on them.


On Wed, Apr 30, 2014 at 2:04 PM, Shushant Arora
<shushantaror...@gmail.com>wrote:

> I have a requirement of processing huge weblogs on daily basis.
>
> 1. data will come incremental to datastore on daily basis and I  need
> cumulative and daily
> distinct user count from logs and after that aggregated data will be
> loaded in RDBMS like mydql.
>
> 2.data will be loaded in hdfs datawarehouse on daily basis and same will
> be fetched from Hdfs warehouse after some filtering in RDMS like mysql and
> will be processed there.
>
> Which datawarehouse is suitable for approach 1 and 2 and why?.
>
> Thanks
> Shushant
>
>


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Center for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Reply via email to