I think 1 cluster per large table should be preferred, rather than per
application.
Example, what if there is a large application that requires several big
tables, each many 10s of tera-bytes in size?
Is it still recommended to have 1 cluster for that app ?


On Fri, Nov 12, 2021 at 2:01 PM Jeff Jirsa <jji...@gmail.com> wrote:

> Oh sorry - a cluster per application makes sense. Sharding within an
> application makes sense to avoid very very very large clusters (think:
> ~thousand nodes). 1 cluster per app/use case.
>
> On Fri, Nov 12, 2021 at 1:39 PM S G <sg.online.em...@gmail.com> wrote:
>
>> Thanks Jeff.
>> Any side-effect on the client config from small clusters perspective?
>>
>> Like several smaller clusters means more CassandraClient objects on the
>> client side but I guess number of connections shall remain the same as
>> number of physical nodes will most likely remain the same only. So I think
>> client side would not see any major issue.
>>
>>
>> On Fri, Nov 12, 2021 at 11:46 AM Jeff Jirsa <jji...@gmail.com> wrote:
>>
>>> Most people are better served building multiple clusters and spending
>>> their engineering time optimizing for maintaining multiple clusters, vs
>>> spending their engineering time learning how to work around the sharp edges
>>> that make large shared clusters hard.
>>>
>>> Large multi-tenant clusters give you less waste and a bit more
>>> elasticity (one tenant can burst and use spare capacity that would
>>> typically be left for the other tenants). However, one bad use case / table
>>> can ruin everything (one bad read that generates GC hits all use cases),
>>> and eventually certain mechanisms/subsystems dont scale past certain points
>>> (e.g. schema - large schemas and large clusters are much harder than small
>>> schemas and small clusters)
>>>
>>>
>>>
>>>
>>> On Fri, Nov 12, 2021 at 11:31 AM S G <sg.online.em...@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> Is there any case where we would prefer one big giant cluster (with
>>>> multiple large tables) over several smaller clusters?
>>>> Apart from some management overhead of multiple Cassandra Clients, it
>>>> seems several smaller clusters are always better than a big one:
>>>>
>>>>    1. Avoids SPOF for all tables
>>>>    2. Helps debugging (less noise from all tables in the logs)
>>>>    3. Traffic spikes on one table do not affect others if they are in
>>>>    different tables.
>>>>    4. We can scale tables independently of each other - so colder data
>>>>    can be in a smaller cluster (more data/node) while hotter data can be 
>>>> on a
>>>>    bigger cluster (less data/node)
>>>>
>>>>
>>>> It does not mean that every table should be in its own cluster.
>>>> But large ones can be moved to their own dedicated clusters (like those
>>>> more than a few terabytes).
>>>> And smaller ones can be clubbed together in one or few clusters.
>>>>
>>>> Please share any recommendations for the above from actual production
>>>> experiences.
>>>> Thanks for helping !
>>>>
>>>>

Reply via email to