Re: Dimensional Data Model on Hive

Jagat Thu, 10 May 2012 08:17:45 -0700

Hello

Try to keep set of records which you need for particular analysis in same
table. Generally we use Pig to feed data to hive tables and we have
arranged our tables such that all the data which is to required for
particular report is right present in that table. This helps to improve
hive performance. While designing your schema Partition , index your tables
depending on your queries.


Fact and dimensions concept should not be taken too seriously here.


On Thu, May 10, 2012 at 6:56 PM, Kuldeep Chitrakar <
kuldeep.chitra...@synechron.com> wrote:

>  Hi ****
>
> ** **
>
> I have data warehouse implementation for Click Stream data analysis on
> RDBMS. Its a start schema (Dimensions and Facts).****
>
> ** **
>
> Now if i want to move to Hive, Do i need to create same data model as
> Dimensions and facts and join them. ****
>
> ** **
>
> I should create a big de-normalized table which contains all textual
> attributes from all dimensions. If so how do we handle SCD 2 type
> dimensions in Hive.****
>
> ** **
>
> Its very basic question but I am just confused on this.****
>
> ** **
>
> ** **
>
> Thanks,****
>
> Kuldeep****
>

Re: Dimensional Data Model on Hive

Reply via email to