On Wed, Jan 7, 2015 at 9:27 AM, Otis Gospodnetic <[email protected] > wrote:
> > Data/table layout: > * HBase is used for storing metrics at different granularities (1min, 5 > min.... - a total of 6 different granularities) > * It's a multi-tenant system > * Keys are carefully crafted and include userId + number, where this number > contains the time and the granularity > * Everything's in 1 table and 1 CF > > Access: > * We only access 1 system at a time, for a specific time range, and > specific granularity > * We periodically scan ALL data and delete data older than N days, where N > varies from user to user > * We periodically scan ALL data and merge multiple rows (of the same > granularity) into 1 > > Are you having a problem Otis that you are trying to solve? > Question: > Would there be any advantage in having 6 tables - one for each granularity > - instead of having everything in 1 table? > It could make for less rewriting of data. If all in the one table, a compaction will rewrite all granularities. If separate tables, the coarser granularities would change less often so would flush/compact -- be rewritten -- less often. You might get similar effect if you put in place a split policy that split regions on a granularity border; e.g. have all the 1minutes in one region and anything at a coarser range goes into a different region. You have notions of the relative proportions of the different granularities? (e.g. is the coarsest granularity 10% or an irrelevant 0.0001%?) Otherwise, as @tsuna says and yeah, what @nick says regards compaction; might be worth exploring... could save you a bunch of churn. St.Ack > Assume each table would still have just 1 CF and the keys would remain the > same. >
