There are two categorically distinct forms of multi-tenancy: 1) You control the 
apps and simply want client data isolation, and 2) The client has their own 
apps and doing direct access to the cluster and using access control at the 
table level to isolate the client data.

Using a tenant ID in the partition key is the preferred approach and works well 
for the first use case, but it doesn’t provide the strict isolation of data 
needed for the second use case. Still, try to use that first approach if you 
can.

You should also consider an application layer which would intermediate between 
the tenant clients and the cluster, supplying the tenant ID in the partition 
key. That does add an extra hop for data access, but is a cleaner design.

If you really do need to maintain separate tables and keyspaces, use what I 
call “sharded clusters” – multiple, independent clusters with a hash on the 
user/tenant ID to select which cluster to use, but limit each cluster to low 
hundreds of tables. It is worth noting that if each tenant needs to be isolated 
anyway, there is clearly no need to store independent tenants on the same 
cluster.

You will have to do your own proof of concept implementation to determine what 
table limit works best for your use case.

-- Jack Krupansky

From: Raj N 
Sent: Wednesday, December 3, 2014 4:54 PM
To: user@cassandra.apache.org 
Subject: Re: Keyspace and table/cf limits

The question is more from a multi-tenancy point of view. We wanted to see if we 
can have a keyspace per client. Each keyspace may have 50 column families, but 
if we have 200 clients, that would be 10,000 column families. Do you think 
that's reasonable to support? I know that key cache capacity is reserved in 
heap still. Any plans to move it off-heap? 

-Raj

On Tue, Nov 25, 2014 at 3:10 PM, Robert Coli <rc...@eventbrite.com> wrote:

  On Tue, Nov 25, 2014 at 9:07 AM, Raj N <raj.cassan...@gmail.com> wrote:

    What's the latest on the maximum number of keyspaces and/or tables that one 
can have in Cassandra 2.1.x?

  Most relevant changes lately would be :

  https://issues.apache.org/jira/browse/CASSANDRA-6689

  and
  https://issues.apache.org/jira/browse/CASSANDRA-6694


  Which should meaningfully reduce the amount of heap memtables consume. That 
heap can then be used to support more heap-persistent structures associated 
with many CFs. I have no idea how to estimate the scale of the improvement.

  As a general/meta statement, Cassandra is very multi-threaded, and consumes 
file handles like crazy. How many different query cases do you really want to 
put on one cluster/node? ;D

  =Rob

Reply via email to