> Okay, I see. But isn't there a big issue for scaling here ?
> Imagine that I am the developper of a certain very successful website : At
> year 1 I need 20 CF. I might need to have 8Gb of RAM. Year 2 I need 50 CF
> because I added functionalities to my wonderful webiste will I need 20 Gb of
> RAM ? And if at year three I had 300 Column families, will I need 120 Gb of
> ram / node ? Or did I miss something about memory consuption ?

It's up to you to size the memtable thresholds appropriately. The
primary driver for memtable threshold size is the desire to avoid
future compaction work by making the flushed memtables larger. As
such, a larger memtable threshold is typically only particularly
relevant for column families that see a lot of writes.

So, if you have 50 column families out of which 2 are very frequently
written and the remainder only rarely, there will probably not be any
great motivation to have any significant memtable thresholds for the
remainder.

If you truly have a lot of column families, all of whom receive an
equal amount of traffic, then to some extent it's a scaling issue in
the sense that you'd be forced to use lower memtable thresholds for
each column family than you would otherwise, and the result of that is
additional compaction work (meaning, less sustainable write
throughput). But you won't be forced to have 120 gig nodes (a 120 gig
heap would be problematic for other reasons anyway).

-- 
/ Peter Schuller

Reply via email to