On Thu, May 10, 2012 at 9:26 AM, Kuldeep Chitrakar <kuldeep.chitra...@synechron.com> wrote: > Hi > > > > I have data warehouse implementation for Click Stream data analysis on > RDBMS. Its a start schema (Dimensions and Facts). > > > > Now if i want to move to Hive, Do i need to create same data model as > Dimensions and facts and join them. > > > > I should create a big de-normalized table which contains all textual > attributes from all dimensions. If so how do we handle SCD 2 type dimensions > in Hive. > > > > Its very basic question but I am just confused on this. > > > > > > Thanks, > > Kuldeep
While hive is sometimes referred to as a data warehouse you usually want to avoid data warehouse concepts like stat-schema. There are a number of reasons for this: 1) No unique constraints 2) limited index capabilities 3) Map side joins are optimal when a single table is small 4) Most join types while generalize into map reduce are much different then a join in single node databases I'm most situations I advice going the "nosql route" and de-normalize almost everything. Optimize for scanning.