Thanks. I think a real example is better for me to understand your suggestions. Now I have a relational table:ID LoginTime DeviceID1 2012-12-12 12:12:12 abcdef2 2012-12-12 19:12:12 abcdef3 2012-12-13 10:10:10 defdaf There are several requirements about this table:1. How many device login in each day?1. For one day, how many new device login? (never login before)1. For one day, how many accumulated device login? How can I design HBase tables to calculate these data?Now my solution is:table A: rowkey: date-deviceidcolumn family: logincolumn qualifier: 2012-12-12 12:12:12/2012-12-12 19:12:12.... table B:rowkey: deviceidcolumn family:null or anyvalue
For req#1, I can scan table A and use prefixfilter(rowkey) to check one special date, and get records countFor req#2, I get table b with each deviceid, and count result For req#3, count table A with prefixfilter like 1. Does it OK? Or other better solutions? Thanks!! > CC: [email protected] > From: [email protected] > Subject: Re: How to design a data warehouse in HBase? > Date: Thu, 13 Dec 2012 08:43:31 +0000 > To: [email protected] > > You need to spend a bit of time on Schema design. > You need to flatten your Schema... > Implement some secondary indexing to improve join performance... > > Depends on what you want to do... There are other options too... > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On Dec 13, 2012, at 7:09 AM, lars hofhansl <[email protected]> wrote: > > > For OLAP type queries you will generally be better off with a truly column > > oriented database. > > You can probably shoehorn HBase into this, but it wasn't really designed > > with raw scan performance along single columns in mind. > > > > > > > > ________________________________ > > From: bigdata <[email protected]> > > To: "[email protected]" <[email protected]> > > Sent: Wednesday, December 12, 2012 9:57 PM > > Subject: How to design a data warehouse in HBase? > > > > Dear all, > > We have a traditional star-model data warehouse in RDBMS, now we want to > > transfer it to HBase. After study HBase, I learn that HBase is normally can > > be query by rowkey. > > 1.full rowkey (fastest)2.rowkey filter (fast)3.column family/qualifier > > filter (slow) > > How can I design the HBase tables to implement the warehouse functions, > > like:1.Query by DimensionA2.Query by DimensionA and DimensionB3.Sum, count, > > distinct ... > > From my opinion, I should create several HBase tables with all combinations > > of different dimensions as the rowkey. This solution will lead to huge data > > duplication. Is there any good suggestions to solve it? > > Thanks a lot!
