Hi Folks,

Ideally, it would be awesome if multitanency is a first class citizen in
Cassandra. But as of today, the easiest way to get multitanency (on-disk
data isolation, per tanent recovery & backup, replication strategy) is by
having one keyspace per tanent. However, it’s not recommended to go beyond
300 to 500 tables in one cluster today.

By this thread, I would like to find out the current blocking issues for
supporting high number of tables (10K/50K/?), and contribute the fixes.
Also, open for any ideas for making keyspace itself tanent-aware and
supporting multitanency out-of-the box, but having replication strategy
(NTS) per tanent & on-disk data isolation are minimal features to have. Not
sure but supporting high tables in a cluster may lead us to support
multitanency out-of-the box in the future..

As per my quick discussion with Jonathan & few other folks, I think we
already know below issues:

 1 MB heap per memtables
 Creating CFs can take long time (Fixed - CASSANDRA-6977)
 Multiple flushes turn writes into random than sequential (should we worry
if use SSDs?)
 Unknowns!

Regarding '1 MB per memtable', CASSANDRA-5935 adds an option to allow
disabling slab allocation to pack more CFs, but at the cost of GC pains.
Seems like Cassandra 2.1 off-heap memtables will be a better option.
However, looks like it also uses region-based memory allocation to avoid
fragmentation. Does this mean no GC pain but still need high RAM (for 50K
tables, end up with 50GB)?

>>  (pls. correct if this is not the right file I'm looking into)

public class NativeAllocator extends MemtableAllocator
{
    private static final Logger logger =
LoggerFactory.getLogger(NativeAllocator.class);

    private final static int REGION_SIZE = 1024 * 1024;
    private final static int MAX_CLONED_SIZE = 128 * 1024; // bigger than
this don't go in the region

Would like to know any other known issues that I’ve not listed here and/or
any recommendations for multitanency. Also, any thoughts on supporting
efficient off-heap allocator option for high # of tables?

BTW, having 10K tables brings up many other issues around management,
tooling, etc. but I'm less worried for that, at this point.

Thanks,
Jay

Reply via email to