For memory-sake, you do not want “too many” tables in a single cluster (~200 is a reasonable rule of thumb). But I don’t see a major concern with a few very large tables in the same cluster. The client side, at least in Java, could get large (memory-wise) holding a Cluster object for multiple clusters.
I agree with Jeff: a cluster per app is the cleanest separation we have seen. Multi-tenant leads to many more potential problems. Multi-cluster per app seems unnecessarily complex. Sean Durity – Staff Systems Engineer, Cassandra From: S G <sg.online.em...@gmail.com> Sent: Saturday, November 13, 2021 9:58 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: One big giant cluster or several smaller ones? I think 1 cluster per large table should be preferred, rather than per application. Example, what if there is a large application that requires several big tables, each many 10s of tera-bytes in size? Is it still recommended to have 1 cluster for that app ? On Fri, Nov 12, 2021 at 2:01 PM Jeff Jirsa <jji...@gmail.com<mailto:jji...@gmail.com>> wrote: Oh sorry - a cluster per application makes sense. Sharding within an application makes sense to avoid very very very large clusters (think: ~thousand nodes). 1 cluster per app/use case. On Fri, Nov 12, 2021 at 1:39 PM S G <sg.online.em...@gmail.com<mailto:sg.online.em...@gmail.com>> wrote: Thanks Jeff. Any side-effect on the client config from small clusters perspective? Like several smaller clusters means more CassandraClient objects on the client side but I guess number of connections shall remain the same as number of physical nodes will most likely remain the same only. So I think client side would not see any major issue. On Fri, Nov 12, 2021 at 11:46 AM Jeff Jirsa <jji...@gmail.com<mailto:jji...@gmail.com>> wrote: Most people are better served building multiple clusters and spending their engineering time optimizing for maintaining multiple clusters, vs spending their engineering time learning how to work around the sharp edges that make large shared clusters hard. Large multi-tenant clusters give you less waste and a bit more elasticity (one tenant can burst and use spare capacity that would typically be left for the other tenants). However, one bad use case / table can ruin everything (one bad read that generates GC hits all use cases), and eventually certain mechanisms/subsystems dont scale past certain points (e.g. schema - large schemas and large clusters are much harder than small schemas and small clusters) On Fri, Nov 12, 2021 at 11:31 AM S G <sg.online.em...@gmail.com<mailto:sg.online.em...@gmail.com>> wrote: Hello, Is there any case where we would prefer one big giant cluster (with multiple large tables) over several smaller clusters? Apart from some management overhead of multiple Cassandra Clients, it seems several smaller clusters are always better than a big one: 1. Avoids SPOF for all tables 2. Helps debugging (less noise from all tables in the logs) 3. Traffic spikes on one table do not affect others if they are in different tables. 4. We can scale tables independently of each other - so colder data can be in a smaller cluster (more data/node) while hotter data can be on a bigger cluster (less data/node) It does not mean that every table should be in its own cluster. But large ones can be moved to their own dedicated clusters (like those more than a few terabytes). And smaller ones can be clubbed together in one or few clusters. Please share any recommendations for the above from actual production experiences. Thanks for helping ! INTERNAL USE