I think 1 cluster per large table should be preferred, rather than per application. Example, what if there is a large application that requires several big tables, each many 10s of tera-bytes in size? Is it still recommended to have 1 cluster for that app ?
On Fri, Nov 12, 2021 at 2:01 PM Jeff Jirsa <jji...@gmail.com> wrote: > Oh sorry - a cluster per application makes sense. Sharding within an > application makes sense to avoid very very very large clusters (think: > ~thousand nodes). 1 cluster per app/use case. > > On Fri, Nov 12, 2021 at 1:39 PM S G <sg.online.em...@gmail.com> wrote: > >> Thanks Jeff. >> Any side-effect on the client config from small clusters perspective? >> >> Like several smaller clusters means more CassandraClient objects on the >> client side but I guess number of connections shall remain the same as >> number of physical nodes will most likely remain the same only. So I think >> client side would not see any major issue. >> >> >> On Fri, Nov 12, 2021 at 11:46 AM Jeff Jirsa <jji...@gmail.com> wrote: >> >>> Most people are better served building multiple clusters and spending >>> their engineering time optimizing for maintaining multiple clusters, vs >>> spending their engineering time learning how to work around the sharp edges >>> that make large shared clusters hard. >>> >>> Large multi-tenant clusters give you less waste and a bit more >>> elasticity (one tenant can burst and use spare capacity that would >>> typically be left for the other tenants). However, one bad use case / table >>> can ruin everything (one bad read that generates GC hits all use cases), >>> and eventually certain mechanisms/subsystems dont scale past certain points >>> (e.g. schema - large schemas and large clusters are much harder than small >>> schemas and small clusters) >>> >>> >>> >>> >>> On Fri, Nov 12, 2021 at 11:31 AM S G <sg.online.em...@gmail.com> wrote: >>> >>>> Hello, >>>> >>>> Is there any case where we would prefer one big giant cluster (with >>>> multiple large tables) over several smaller clusters? >>>> Apart from some management overhead of multiple Cassandra Clients, it >>>> seems several smaller clusters are always better than a big one: >>>> >>>> 1. Avoids SPOF for all tables >>>> 2. Helps debugging (less noise from all tables in the logs) >>>> 3. Traffic spikes on one table do not affect others if they are in >>>> different tables. >>>> 4. We can scale tables independently of each other - so colder data >>>> can be in a smaller cluster (more data/node) while hotter data can be >>>> on a >>>> bigger cluster (less data/node) >>>> >>>> >>>> It does not mean that every table should be in its own cluster. >>>> But large ones can be moved to their own dedicated clusters (like those >>>> more than a few terabytes). >>>> And smaller ones can be clubbed together in one or few clusters. >>>> >>>> Please share any recommendations for the above from actual production >>>> experiences. >>>> Thanks for helping ! >>>> >>>>