Re: 1 table, 1 dense CF => N tables, 1 dense CF ?

Otis Gospodnetic Fri, 09 Jan 2015 06:36:33 -0800

Hi,

What Nick suggests below about using Compaction Coprocessor sounds
potentially very useful for us.  Q below.


On Wed, Jan 7, 2015 at 8:21 PM, Nick Dimiduk <[email protected]> wrote:

> Not to dig too deep into ancient history, but Tsuna's comments are mostly
> still relevant today, except for...
>
> You also generally end up with fewer, bigger regions, which is almost
> > always better.  This entails that your RS are writing more data to fewer
> > WALs, which leads to more sequential writes across the board.  You'll end
> > up with fewer HLogs, which is also a good thing.
>
>
> HBase is one WAL per region server and has been for as long as I've paid
> attention. Unless I've missed something, number of tables doesn't change
> this fixed number.
>
> If you use HBase's client (which is most likely the case as the only other
> > alternative is asynchbase), beware that you need to create one HTable
> > instance per table per thread in your application code.
>
>
> You can still write your client application this way, but the preferred
> idiom is to use a single Connection instance from which all these resources
> are shared across HTable instances. This pattern is reinforced in the new
> client API introduced in 1.0
>
> FYI, I think you can write a Compaction coprocessor that implements your
> data expiration policy through normal compaction operations, thereby
> removing the necessity of the (expensive?) scan + write delete pattern
> entirely.
>

We actually do 2 types of full scans:
1) scan everything and delete rows > N days old, where N can be different
for different users
2) scan everything and merge multiple rows into 1 row via HBaseHUT -
https://github.com/sematext/HBaseHUT

2) is more expensive than 1).
I'm wondering if we could use Compaction Coprocessor for 2)?  HBaseHUT
needs to be able to grab N rows and merge them into 1, delete those N rows,
and just write that 1 new row.  This N could be several thousand rows.
Could Compaction Coprocessor really be used for that?

Also, would that come into play during minor or major compactions or both?

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/




>
> -n
>
> On Wed, Jan 7, 2015 at 9:27 AM, Otis Gospodnetic <
> [email protected]
> > wrote:
>
> > Hi,
> >
> > It's been asked before, but I didn't find any *definite* answers and a
> lot
> > of answers I found via  are from a whiiiile back.
> >
> > e.g. Tsuna provided pretty convincing info here:
> >
> >
> http://search-hadoop.com/m/xAiiO8ttU2/%2522%2522I+generally+recommend+to+stick+to+a+single+table%2522&subj=Re+One+table+or+multiple+tables+
> >
> > ... but that is from 3 years ago.  Maybe things changed?
> >
> > Here's our use case:
> >
> > Data/table layout:
> > * HBase is used for storing metrics at different granularities (1min, 5
> > min.... - a total of 6 different granularities)
> > * It's a multi-tenant system
> > * Keys are carefully crafted and include userId + number, where this
> number
> > contains the time and the granularity
> > * Everything's in 1 table and 1 CF
> >
> > Access:
> > * We only access 1 system at a time, for a specific time range, and
> > specific granularity
> > * We periodically scan ALL data and delete data older than N days, where
> N
> > varies from user to user
> > * We periodically scan ALL data and merge multiple rows (of the same
> > granularity) into 1
> >
> > Question:
> > Would there be any advantage in having 6 tables - one for each
> granularity
> > - instead of having everything in 1 table?
> > Assume each table would still have just 1 CF and the keys would remain
> the
> > same.
> >
> > Thanks,
> > Otis
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
>

Re: 1 table, 1 dense CF => N tables, 1 dense CF ?

Reply via email to