Thanks. I think a real example is better for me to understand your suggestions.
Now I have a relational table:ID   LoginTime                    DeviceID1     
2012-12-12 12:12:12   abcdef2     2012-12-12  19:12:12   abcdef3      
2012-12-13   10:10:10  defdaf
There are several requirements about this table:1. How many device login in 
each day?1. For one day, how many new device login? (never login before)1. For 
one day, how many accumulated device login?
How can I design HBase tables to calculate these data?Now my solution is:table 
A:     
rowkey:  date-deviceidcolumn family: logincolumn qualifier:  2012-12-12 
12:12:12/2012-12-12 19:12:12....
table B:rowkey: deviceidcolumn family:null or anyvalue

For req#1, I can scan table A and use prefixfilter(rowkey) to check one special 
date, and get records countFor req#2, I get table b with each deviceid, and 
count result
For req#3, count table A with prefixfilter like 1.
Does it OK?  Or other better solutions?
Thanks!!

> CC: [email protected]
> From: [email protected]
> Subject: Re: How to design a data warehouse in HBase?
> Date: Thu, 13 Dec 2012 08:43:31 +0000
> To: [email protected]
> 
> You need to spend a bit of time on Schema design.
> You need to flatten your Schema...
> Implement some secondary indexing to improve join performance...
> 
> Depends on what you want to do... There are other options too...
> 
> Sent from a remote device. Please excuse any typos...
> 
> Mike Segel
> 
> On Dec 13, 2012, at 7:09 AM, lars hofhansl <[email protected]> wrote:
> 
> > For OLAP type queries you will generally be better off with a truly column 
> > oriented database.
> > You can probably shoehorn HBase into this, but it wasn't really designed 
> > with raw scan performance along single columns in mind.
> > 
> > 
> > 
> > ________________________________
> > From: bigdata <[email protected]>
> > To: "[email protected]" <[email protected]> 
> > Sent: Wednesday, December 12, 2012 9:57 PM
> > Subject: How to design a data warehouse in HBase?
> > 
> > Dear all,
> > We have a traditional star-model data warehouse in RDBMS, now we want to 
> > transfer it to HBase. After study HBase, I learn that HBase is normally can 
> > be query by rowkey.
> > 1.full rowkey (fastest)2.rowkey filter (fast)3.column family/qualifier 
> > filter (slow)
> > How can I design the HBase tables to implement the warehouse functions, 
> > like:1.Query by DimensionA2.Query by DimensionA and DimensionB3.Sum, count, 
> > distinct ...
> > From my opinion, I should create several HBase tables with all combinations 
> > of different dimensions as the rowkey. This solution will lead to huge data 
> > duplication. Is there any good suggestions to solve it?
> > Thanks a lot!
                                          

Reply via email to