> Okay, I see. But isn't there a big issue for scaling here ? > Imagine that I am the developper of a certain very successful website : At > year 1 I need 20 CF. I might need to have 8Gb of RAM. Year 2 I need 50 CF > because I added functionalities to my wonderful webiste will I need 20 Gb of > RAM ? And if at year three I had 300 Column families, will I need 120 Gb of > ram / node ? Or did I miss something about memory consuption ?
It's up to you to size the memtable thresholds appropriately. The primary driver for memtable threshold size is the desire to avoid future compaction work by making the flushed memtables larger. As such, a larger memtable threshold is typically only particularly relevant for column families that see a lot of writes. So, if you have 50 column families out of which 2 are very frequently written and the remainder only rarely, there will probably not be any great motivation to have any significant memtable thresholds for the remainder. If you truly have a lot of column families, all of whom receive an equal amount of traffic, then to some extent it's a scaling issue in the sense that you'd be forced to use lower memtable thresholds for each column family than you would otherwise, and the result of that is additional compaction work (meaning, less sustainable write throughput). But you won't be forced to have 120 gig nodes (a 120 gig heap would be problematic for other reasons anyway). -- / Peter Schuller