There are two categorically distinct forms of multi-tenancy: 1) You control the apps and simply want client data isolation, and 2) The client has their own apps and doing direct access to the cluster and using access control at the table level to isolate the client data.
Using a tenant ID in the partition key is the preferred approach and works well for the first use case, but it doesn’t provide the strict isolation of data needed for the second use case. Still, try to use that first approach if you can. You should also consider an application layer which would intermediate between the tenant clients and the cluster, supplying the tenant ID in the partition key. That does add an extra hop for data access, but is a cleaner design. If you really do need to maintain separate tables and keyspaces, use what I call “sharded clusters” – multiple, independent clusters with a hash on the user/tenant ID to select which cluster to use, but limit each cluster to low hundreds of tables. It is worth noting that if each tenant needs to be isolated anyway, there is clearly no need to store independent tenants on the same cluster. You will have to do your own proof of concept implementation to determine what table limit works best for your use case. -- Jack Krupansky From: Raj N Sent: Wednesday, December 3, 2014 4:54 PM To: user@cassandra.apache.org Subject: Re: Keyspace and table/cf limits The question is more from a multi-tenancy point of view. We wanted to see if we can have a keyspace per client. Each keyspace may have 50 column families, but if we have 200 clients, that would be 10,000 column families. Do you think that's reasonable to support? I know that key cache capacity is reserved in heap still. Any plans to move it off-heap? -Raj On Tue, Nov 25, 2014 at 3:10 PM, Robert Coli <rc...@eventbrite.com> wrote: On Tue, Nov 25, 2014 at 9:07 AM, Raj N <raj.cassan...@gmail.com> wrote: What's the latest on the maximum number of keyspaces and/or tables that one can have in Cassandra 2.1.x? Most relevant changes lately would be : https://issues.apache.org/jira/browse/CASSANDRA-6689 and https://issues.apache.org/jira/browse/CASSANDRA-6694 Which should meaningfully reduce the amount of heap memtables consume. That heap can then be used to support more heap-persistent structures associated with many CFs. I have no idea how to estimate the scale of the improvement. As a general/meta statement, Cassandra is very multi-threaded, and consumes file handles like crazy. How many different query cases do you really want to put on one cluster/node? ;D =Rob