Hi Puneet,

Upsert could be easy -
e.g. If we want to load data for 2015062412
---------------------------------------------------
*ADID          | COUNTRY       | HOUR ID*      |  CF.IMP  |
---------------------------------------------------------------------------------
1                | US                  | 2015062412 <2015062301>  |
2000        |

Then upsert will be -
UPSERT INTO myTABLE (ADID,COUNTRY,DAYID,IMP*12*)
VALUES(1,'US',20150624,2000) ;


Your point is true that HBase will store 24 key-values. With one of ebay
blogs it says,  think your key value as sorted map.



*SortedMap<RowKey, SortedMap<ColumnKey, ColumnValue>>*
If we refer this, I guess keeping 24 columns does make a difference versus
hour id in row key.

Thanks
Pari

On 23 June 2015 at 19:10, Puneet Kumar Ojha <puneet.ku...@pubmatic.com>
wrote:

>  Whether you keep 24 column .Hbase will still store it as 24 records only.
>
>
>
> There are benefits on use cases. Like breaking into column families , how
> updates behave for a particular rowkey if you go with 24 columns.
>
>
>
>
>
> *From:* Pariksheet Barapatre [mailto:pbarapa...@gmail.com]
> *Sent:* Tuesday, June 23, 2015 7:07 PM
> *To:* user@phoenix.apache.org
> *Subject:* Re: Data Model Suggestion
>
>
>
> Thanks Skanda for your suggestion. If we go with 24 columns , won't it
> reduce number of rows over large data set and will improve scan
> performance. If I want to get daily report, I just have to scan only one
> row.
>
> Your thoughts?
>
> Thanks
>
> Pari
>
>
>
> On 23 June 2015 at 18:57, Skanda <skanda.ganapa...@gmail.com> wrote:
>
>   Hi Pari
>
> For your use-case, having it as part of the rowkey should be a better
> design than creating so many columns.
>
> Regards
>
> Skanda
>
>
>
> On Tue, Jun 23, 2015 at 6:49 PM, Pariksheet Barapatre <
> pbarapa...@gmail.com> wrote:
>
>  Hello All,
>
>
>
> This is more like a HBase question but as I am planning to use Phoenix as
> a access layer, I hope phoenix user will help me.
>
> I would like to create time series data to get on-the-fly analytics.
>
> This use case is for adTech.
>
> Report - what is houly,daily,weekly impression counts at country level for
> a given advertisement ID (ADID).
>
> I am doing hourly aggregation and loading into a Phoenix table.
>
> Primary Key - *ADID          | COUNTRY       | HOUR ID*
>
>
>
> ---------------------------------------------------------------------------------
>
> *ADID          | COUNTRY       | HOUR ID*      |  CF.IMP  |
>
>
> ---------------------------------------------------------------------------------
>
> 1                | US                  | 2015062301  | 3000        |
>
> 1                | US                  | 2015062302  | 3421        |
>
> 1                | UK                  | 2015062302  | 1212        |
>
> ---------------------------------------------------------------------------------
>
> Is it a good schema design or shall I create alternate schema as below
> Primary Key - *ADID          | COUNTRY       | DAY ID*
>
> ----------------------------------------------------------------------------------------------------
>
> *ADID          | COUNTRY       | DAY ID*      |  CF.IMP*01*  | CF.IMP*02*
> |
>
>
> ----------------------------------------------------------------------------------------------------
>
> 1                | US                  | 20150623  | 3000        |
> 3421         |
>
> 1                | UK                  | 20150623  | NULL        |
> 1212          |
>
>
> ----------------------------------------------------------------------------------------------------
>
> Here, I have taken hour part from hour ID and created 24 columns.
>
> I gone through many time-series NoSQL blog posts, most the author suggest
> to go with wider rows as above. This will reduce the scan, but I don't see
> much difference in both Data Models in term of latency for scanning.
>
> Can anybody please suggest good approach for my use case?
>
>
>
> Cheers,
>
> Pari
>
>
>
>
>
>
> --
>
> Cheers,
>
> Pari
>



-- 
Cheers,
Pari

Reply via email to