The "conventional wisdom" says that it's ideal to only use "in the low
hundreds" in the number of tables with cassandra as each table can use 1MB
or so of heap.  So if you have 1000 tables you'd have 1GB of heap used
(which is no fun).

But is this an issue with the tables themselves or the SSTables?

I think the root of this is the SSTables as all the arena overhead will be
for the SSTables too and more SSTables means more overhead.

So by adding more tables, you end up with more SSTables which means more
heap memory.

If I'm in correct then this means that Cassandra could benefit from table
partitioning.  Whereby you put all values in a specific region to a
specific set of tables.

So if you were storing log data, you could store it in hourly, or daily
partitions, but view the table as one logical unit.

the benefit here is that you could easily just drop the oldest data.  So if
you need to clean up data, you wouldn't have to drop the whole table, just
a days worth of the data.

And since that day is just one SSTable on disk, the drop would be easy.. no
tombstones, just delete the whole SSTable.



-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Reply via email to