Hi,

So, there's two things where you'll see the impact of "lots of datacenters"

On the query side, global quorum queries (and queries with cross-dc
probabilistic read repair) may touch more DCs and be slower, and
read-repairs during those queries get more expensive. Your geography
matters a ton for latency, and your write consistency and network quality
matters a ton for read repairs. During the read, the coordinator will track
which replicas are mismatching, and build mutations to make them in sync -
that buildup will accumulate more data if you're very out of sync.

The other thing you should expect is different behavior during repairs. The
anti-entropy repairs do pair-wise merkle trees. If you imagine 6, 8, 12
datacenters of 3 copies each, you've got 18, 24, 36 copies of data, each of
those holds a merkle tree. The repair coordinator will have a lot more data
in memory (adjusting the tree depth in newer versions, or using the offheap
option in 4.0) starts removing the GC pressure on the coordinator in those
types of topologies. In older versions, using subrange repair and lots of
smaller ranges will avoid very deep trees and keep memory tolerable. ALSO,
when you do have a mismatch, you're going to stream a LOT of data. Again,
in 12x3, if one replica goes down beyond the hint window, when it comes up
it's getting 35 copies of data, which is going to overwhelm it when it
streams and compacts. CASSANDRA-3200 helps this in 4.0, and incremental
repair helps this if you're running incremental repair (again, probably
after CASSANDRA-9143 in 4.0), but the naive approach can lead to really bad
surprises.



On Wed, Jul 14, 2021 at 7:17 AM Shaurya Gupta <shaurya.n...@gmail.com>
wrote:

> Hi, Multiple DCs are required to maintain lower latencies for requests
> across the globe. I agree that it's a lot of redundant copies of data.
>
> On Wed, Jul 14, 2021, 7:00 PM Jim Shaw <jxys...@gmail.com> wrote:
>
>> Shaurya:
>> What's the purpose to partise too many data centers ?
>> RF=3,  is within a center,  you have 3 copies of data.
>> If you have 3 DCs, means 9 copies of data.
>> Think about space wasted, Network bandwidth wasted for number of copies.
>> BTW, Ours just 2 DCs for regional DR.
>>
>> Thanks,
>> Jim
>>
>> On Wed, Jul 14, 2021 at 2:27 AM Shaurya Gupta <shaurya.n...@gmail.com>
>> wrote:
>>
>>> Hi
>>>
>>> Does someone have any suggestions on the maximum number of Data Centers
>>> which NetworkTopology strategy can have for a keyspace. Not only
>>> technically but considering performance as well.
>>> In each Data Center RF is 3.
>>>
>>> Thanks!
>>> --
>>> Shaurya Gupta
>>>
>>>
>>>

Reply via email to