Hi Puneet, Upsert could be easy - e.g. If we want to load data for 2015062412 --------------------------------------------------- *ADID | COUNTRY | HOUR ID* | CF.IMP | --------------------------------------------------------------------------------- 1 | US | 2015062412 <2015062301> | 2000 |
Then upsert will be - UPSERT INTO myTABLE (ADID,COUNTRY,DAYID,IMP*12*) VALUES(1,'US',20150624,2000) ; Your point is true that HBase will store 24 key-values. With one of ebay blogs it says, think your key value as sorted map. *SortedMap<RowKey, SortedMap<ColumnKey, ColumnValue>>* If we refer this, I guess keeping 24 columns does make a difference versus hour id in row key. Thanks Pari On 23 June 2015 at 19:10, Puneet Kumar Ojha <puneet.ku...@pubmatic.com> wrote: > Whether you keep 24 column .Hbase will still store it as 24 records only. > > > > There are benefits on use cases. Like breaking into column families , how > updates behave for a particular rowkey if you go with 24 columns. > > > > > > *From:* Pariksheet Barapatre [mailto:pbarapa...@gmail.com] > *Sent:* Tuesday, June 23, 2015 7:07 PM > *To:* user@phoenix.apache.org > *Subject:* Re: Data Model Suggestion > > > > Thanks Skanda for your suggestion. If we go with 24 columns , won't it > reduce number of rows over large data set and will improve scan > performance. If I want to get daily report, I just have to scan only one > row. > > Your thoughts? > > Thanks > > Pari > > > > On 23 June 2015 at 18:57, Skanda <skanda.ganapa...@gmail.com> wrote: > > Hi Pari > > For your use-case, having it as part of the rowkey should be a better > design than creating so many columns. > > Regards > > Skanda > > > > On Tue, Jun 23, 2015 at 6:49 PM, Pariksheet Barapatre < > pbarapa...@gmail.com> wrote: > > Hello All, > > > > This is more like a HBase question but as I am planning to use Phoenix as > a access layer, I hope phoenix user will help me. > > I would like to create time series data to get on-the-fly analytics. > > This use case is for adTech. > > Report - what is houly,daily,weekly impression counts at country level for > a given advertisement ID (ADID). > > I am doing hourly aggregation and loading into a Phoenix table. > > Primary Key - *ADID | COUNTRY | HOUR ID* > > > > --------------------------------------------------------------------------------- > > *ADID | COUNTRY | HOUR ID* | CF.IMP | > > > --------------------------------------------------------------------------------- > > 1 | US | 2015062301 | 3000 | > > 1 | US | 2015062302 | 3421 | > > 1 | UK | 2015062302 | 1212 | > > --------------------------------------------------------------------------------- > > Is it a good schema design or shall I create alternate schema as below > Primary Key - *ADID | COUNTRY | DAY ID* > > ---------------------------------------------------------------------------------------------------- > > *ADID | COUNTRY | DAY ID* | CF.IMP*01* | CF.IMP*02* > | > > > ---------------------------------------------------------------------------------------------------- > > 1 | US | 20150623 | 3000 | > 3421 | > > 1 | UK | 20150623 | NULL | > 1212 | > > > ---------------------------------------------------------------------------------------------------- > > Here, I have taken hour part from hour ID and created 24 columns. > > I gone through many time-series NoSQL blog posts, most the author suggest > to go with wider rows as above. This will reduce the scan, but I don't see > much difference in both Data Models in term of latency for scanning. > > Can anybody please suggest good approach for my use case? > > > > Cheers, > > Pari > > > > > > > -- > > Cheers, > > Pari > -- Cheers, Pari