Re: Dimensional Data Model on Hive

Edward Capriolo Thu, 10 May 2012 06:54:20 -0700

On Thu, May 10, 2012 at 9:26 AM, Kuldeep Chitrakar
<kuldeep.chitra...@synechron.com> wrote:
> Hi
>
>
>
> I have data warehouse implementation for Click Stream data analysis on
> RDBMS. Its a start schema (Dimensions and Facts).
>
>
>
> Now if i want to move to Hive, Do i need to create same data model as
> Dimensions and facts and join them.
>
>
>
> I should create a big de-normalized table which contains all textual
> attributes from all dimensions. If so how do we handle SCD 2 type dimensions
> in Hive.
>
>
>
> Its very basic question but I am just confused on this.
>
>
>
>
>
> Thanks,
>
> Kuldeep


While hive is sometimes referred to as a data warehouse you usually
want to avoid data warehouse concepts like stat-schema. There are a
number of reasons for this:
1) No unique constraints
2) limited index capabilities
3) Map side joins are optimal when a single table is small
4) Most join types while generalize into map reduce are much different
then a join in single node databases

I'm most situations I advice going the "nosql route" and de-normalize
almost everything. Optimize for scanning.

Re: Dimensional Data Model on Hive

Reply via email to