For memory-sake, you do not want “too many” tables in a single cluster (~200 is 
a reasonable rule of thumb). But I don’t see a major concern with a few very 
large tables in the same cluster. The client side, at least in Java, could get 
large (memory-wise) holding a Cluster object for multiple clusters.

I agree with Jeff: a cluster per app is the cleanest separation we have seen. 
Multi-tenant leads to many more potential problems. Multi-cluster per app seems 
unnecessarily complex.



Sean Durity – Staff Systems Engineer, Cassandra

From: S G <sg.online.em...@gmail.com>
Sent: Saturday, November 13, 2021 9:58 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: One big giant cluster or several smaller ones?

I think 1 cluster per large table should be preferred, rather than per 
application.
Example, what if there is a large application that requires several big tables, 
each many 10s of tera-bytes in size?
Is it still recommended to have 1 cluster for that app ?


On Fri, Nov 12, 2021 at 2:01 PM Jeff Jirsa 
<jji...@gmail.com<mailto:jji...@gmail.com>> wrote:
Oh sorry - a cluster per application makes sense. Sharding within an 
application makes sense to avoid very very very large clusters (think: 
~thousand nodes). 1 cluster per app/use case.

On Fri, Nov 12, 2021 at 1:39 PM S G 
<sg.online.em...@gmail.com<mailto:sg.online.em...@gmail.com>> wrote:
Thanks Jeff.
Any side-effect on the client config from small clusters perspective?

Like several smaller clusters means more CassandraClient objects on the client 
side but I guess number of connections shall remain the same as number of 
physical nodes will most likely remain the same only. So I think client side 
would not see any major issue.


On Fri, Nov 12, 2021 at 11:46 AM Jeff Jirsa 
<jji...@gmail.com<mailto:jji...@gmail.com>> wrote:
Most people are better served building multiple clusters and spending their 
engineering time optimizing for maintaining multiple clusters, vs spending 
their engineering time learning how to work around the sharp edges that make 
large shared clusters hard.

Large multi-tenant clusters give you less waste and a bit more elasticity (one 
tenant can burst and use spare capacity that would typically be left for the 
other tenants). However, one bad use case / table can ruin everything (one bad 
read that generates GC hits all use cases), and eventually certain 
mechanisms/subsystems dont scale past certain points (e.g. schema - large 
schemas and large clusters are much harder than small schemas and small 
clusters)




On Fri, Nov 12, 2021 at 11:31 AM S G 
<sg.online.em...@gmail.com<mailto:sg.online.em...@gmail.com>> wrote:
Hello,

Is there any case where we would prefer one big giant cluster (with multiple 
large tables) over several smaller clusters?
Apart from some management overhead of multiple Cassandra Clients, it seems 
several smaller clusters are always better than a big one:

  1.  Avoids SPOF for all tables
  2.  Helps debugging (less noise from all tables in the logs)
  3.  Traffic spikes on one table do not affect others if they are in different 
tables.
  4.  We can scale tables independently of each other - so colder data can be 
in a smaller cluster (more data/node) while hotter data can be on a bigger 
cluster (less data/node)

It does not mean that every table should be in its own cluster.
But large ones can be moved to their own dedicated clusters (like those more 
than a few terabytes).
And smaller ones can be clubbed together in one or few clusters.

Please share any recommendations for the above from actual production 
experiences.
Thanks for helping !



INTERNAL USE

Reply via email to