Hi Folks, Ideally, it would be awesome if multitanency is a first class citizen in Cassandra. But as of today, the easiest way to get multitanency (on-disk data isolation, per tanent recovery & backup, replication strategy) is by having one keyspace per tanent. However, it’s not recommended to go beyond 300 to 500 tables in one cluster today.
By this thread, I would like to find out the current blocking issues for supporting high number of tables (10K/50K/?), and contribute the fixes. Also, open for any ideas for making keyspace itself tanent-aware and supporting multitanency out-of-the box, but having replication strategy (NTS) per tanent & on-disk data isolation are minimal features to have. Not sure but supporting high tables in a cluster may lead us to support multitanency out-of-the box in the future.. As per my quick discussion with Jonathan & few other folks, I think we already know below issues: 1 MB heap per memtables Creating CFs can take long time (Fixed - CASSANDRA-6977) Multiple flushes turn writes into random than sequential (should we worry if use SSDs?) Unknowns! Regarding '1 MB per memtable', CASSANDRA-5935 adds an option to allow disabling slab allocation to pack more CFs, but at the cost of GC pains. Seems like Cassandra 2.1 off-heap memtables will be a better option. However, looks like it also uses region-based memory allocation to avoid fragmentation. Does this mean no GC pain but still need high RAM (for 50K tables, end up with 50GB)? >> (pls. correct if this is not the right file I'm looking into) public class NativeAllocator extends MemtableAllocator { private static final Logger logger = LoggerFactory.getLogger(NativeAllocator.class); private final static int REGION_SIZE = 1024 * 1024; private final static int MAX_CLONED_SIZE = 128 * 1024; // bigger than this don't go in the region Would like to know any other known issues that I’ve not listed here and/or any recommendations for multitanency. Also, any thoughts on supporting efficient off-heap allocator option for high # of tables? BTW, having 10K tables brings up many other issues around management, tooling, etc. but I'm less worried for that, at this point. Thanks, Jay