Re: Number of DCs in Cassandra

2021-07-14 Thread Shaurya Gupta
We are planning to go with 5 DCs with RF of 3 in each. All DCs will have
reads and writes. Most queries are done at LOCAL_QUORUM.
A very few Simple and CAS queries (<0.1%) will be done at QUORUM
consistency.

On Wed, Jul 14, 2021 at 12:19 PM manish khandelwal <
manishkhandelwa...@gmail.com> wrote:

> I don't think there is any restriction on the number of data centers. So
> technically you can add as many data centers you want.
>  Performance depends on how you use your cluster. For example, one of your
> data centers could be read only, or is there traffic on all the data
> centers.
>
>
> On Wed, Jul 14, 2021 at 12:03 PM Shaurya Gupta 
> wrote:
>
>> Hi
>>
>> Does someone have any suggestions on the maximum number of Data Centers
>> which NetworkTopology strategy can have for a keyspace. Not only
>> technically but considering performance as well.
>> In each Data Center RF is 3.
>>
>> Thanks!
>> --
>> Shaurya Gupta
>>
>>
>>

-- 
Shaurya Gupta


Re: Number of DCs in Cassandra

2021-07-14 Thread manish khandelwal
Reading and writing with local_quorum should not be a problem in terms of
performance if all data centers are healthy. But Quorum queries will take
hit due to network latency and that is expected and I believe you are aware
of that.

On Wed, Jul 14, 2021 at 12:38 PM Shaurya Gupta 
wrote:

> We are planning to go with 5 DCs with RF of 3 in each. All DCs will have
> reads and writes. Most queries are done at LOCAL_QUORUM.
> A very few Simple and CAS queries (<0.1%) will be done at QUORUM
> consistency.
>
> On Wed, Jul 14, 2021 at 12:19 PM manish khandelwal <
> manishkhandelwa...@gmail.com> wrote:
>
>> I don't think there is any restriction on the number of data centers. So
>> technically you can add as many data centers you want.
>>  Performance depends on how you use your cluster. For example, one of
>> your data centers could be read only, or is there traffic on all the data
>> centers.
>>
>>
>> On Wed, Jul 14, 2021 at 12:03 PM Shaurya Gupta 
>> wrote:
>>
>>> Hi
>>>
>>> Does someone have any suggestions on the maximum number of Data Centers
>>> which NetworkTopology strategy can have for a keyspace. Not only
>>> technically but considering performance as well.
>>> In each Data Center RF is 3.
>>>
>>> Thanks!
>>> --
>>> Shaurya Gupta
>>>
>>>
>>>
>
> --
> Shaurya Gupta
>
>
>


Re: Number of DCs in Cassandra

2021-07-14 Thread Jim Shaw
Shaurya:
What's the purpose to partise too many data centers ?
RF=3,  is within a center,  you have 3 copies of data.
If you have 3 DCs, means 9 copies of data.
Think about space wasted, Network bandwidth wasted for number of copies.
BTW, Ours just 2 DCs for regional DR.

Thanks,
Jim

On Wed, Jul 14, 2021 at 2:27 AM Shaurya Gupta 
wrote:

> Hi
>
> Does someone have any suggestions on the maximum number of Data Centers
> which NetworkTopology strategy can have for a keyspace. Not only
> technically but considering performance as well.
> In each Data Center RF is 3.
>
> Thanks!
> --
> Shaurya Gupta
>
>
>


Re: Number of DCs in Cassandra

2021-07-14 Thread Shaurya Gupta
Hi, Multiple DCs are required to maintain lower latencies for requests
across the globe. I agree that it's a lot of redundant copies of data.

On Wed, Jul 14, 2021, 7:00 PM Jim Shaw  wrote:

> Shaurya:
> What's the purpose to partise too many data centers ?
> RF=3,  is within a center,  you have 3 copies of data.
> If you have 3 DCs, means 9 copies of data.
> Think about space wasted, Network bandwidth wasted for number of copies.
> BTW, Ours just 2 DCs for regional DR.
>
> Thanks,
> Jim
>
> On Wed, Jul 14, 2021 at 2:27 AM Shaurya Gupta 
> wrote:
>
>> Hi
>>
>> Does someone have any suggestions on the maximum number of Data Centers
>> which NetworkTopology strategy can have for a keyspace. Not only
>> technically but considering performance as well.
>> In each Data Center RF is 3.
>>
>> Thanks!
>> --
>> Shaurya Gupta
>>
>>
>>


Re: Number of DCs in Cassandra

2021-07-14 Thread Jeff Jirsa
Hi,

So, there's two things where you'll see the impact of "lots of datacenters"

On the query side, global quorum queries (and queries with cross-dc
probabilistic read repair) may touch more DCs and be slower, and
read-repairs during those queries get more expensive. Your geography
matters a ton for latency, and your write consistency and network quality
matters a ton for read repairs. During the read, the coordinator will track
which replicas are mismatching, and build mutations to make them in sync -
that buildup will accumulate more data if you're very out of sync.

The other thing you should expect is different behavior during repairs. The
anti-entropy repairs do pair-wise merkle trees. If you imagine 6, 8, 12
datacenters of 3 copies each, you've got 18, 24, 36 copies of data, each of
those holds a merkle tree. The repair coordinator will have a lot more data
in memory (adjusting the tree depth in newer versions, or using the offheap
option in 4.0) starts removing the GC pressure on the coordinator in those
types of topologies. In older versions, using subrange repair and lots of
smaller ranges will avoid very deep trees and keep memory tolerable. ALSO,
when you do have a mismatch, you're going to stream a LOT of data. Again,
in 12x3, if one replica goes down beyond the hint window, when it comes up
it's getting 35 copies of data, which is going to overwhelm it when it
streams and compacts. CASSANDRA-3200 helps this in 4.0, and incremental
repair helps this if you're running incremental repair (again, probably
after CASSANDRA-9143 in 4.0), but the naive approach can lead to really bad
surprises.



On Wed, Jul 14, 2021 at 7:17 AM Shaurya Gupta 
wrote:

> Hi, Multiple DCs are required to maintain lower latencies for requests
> across the globe. I agree that it's a lot of redundant copies of data.
>
> On Wed, Jul 14, 2021, 7:00 PM Jim Shaw  wrote:
>
>> Shaurya:
>> What's the purpose to partise too many data centers ?
>> RF=3,  is within a center,  you have 3 copies of data.
>> If you have 3 DCs, means 9 copies of data.
>> Think about space wasted, Network bandwidth wasted for number of copies.
>> BTW, Ours just 2 DCs for regional DR.
>>
>> Thanks,
>> Jim
>>
>> On Wed, Jul 14, 2021 at 2:27 AM Shaurya Gupta 
>> wrote:
>>
>>> Hi
>>>
>>> Does someone have any suggestions on the maximum number of Data Centers
>>> which NetworkTopology strategy can have for a keyspace. Not only
>>> technically but considering performance as well.
>>> In each Data Center RF is 3.
>>>
>>> Thanks!
>>> --
>>> Shaurya Gupta
>>>
>>>
>>>