Re: Number of DCs in Cassandra
We are planning to go with 5 DCs with RF of 3 in each. All DCs will have reads and writes. Most queries are done at LOCAL_QUORUM. A very few Simple and CAS queries (<0.1%) will be done at QUORUM consistency. On Wed, Jul 14, 2021 at 12:19 PM manish khandelwal < manishkhandelwa...@gmail.com> wrote: > I don't think there is any restriction on the number of data centers. So > technically you can add as many data centers you want. > Performance depends on how you use your cluster. For example, one of your > data centers could be read only, or is there traffic on all the data > centers. > > > On Wed, Jul 14, 2021 at 12:03 PM Shaurya Gupta > wrote: > >> Hi >> >> Does someone have any suggestions on the maximum number of Data Centers >> which NetworkTopology strategy can have for a keyspace. Not only >> technically but considering performance as well. >> In each Data Center RF is 3. >> >> Thanks! >> -- >> Shaurya Gupta >> >> >> -- Shaurya Gupta
Re: Number of DCs in Cassandra
Reading and writing with local_quorum should not be a problem in terms of performance if all data centers are healthy. But Quorum queries will take hit due to network latency and that is expected and I believe you are aware of that. On Wed, Jul 14, 2021 at 12:38 PM Shaurya Gupta wrote: > We are planning to go with 5 DCs with RF of 3 in each. All DCs will have > reads and writes. Most queries are done at LOCAL_QUORUM. > A very few Simple and CAS queries (<0.1%) will be done at QUORUM > consistency. > > On Wed, Jul 14, 2021 at 12:19 PM manish khandelwal < > manishkhandelwa...@gmail.com> wrote: > >> I don't think there is any restriction on the number of data centers. So >> technically you can add as many data centers you want. >> Performance depends on how you use your cluster. For example, one of >> your data centers could be read only, or is there traffic on all the data >> centers. >> >> >> On Wed, Jul 14, 2021 at 12:03 PM Shaurya Gupta >> wrote: >> >>> Hi >>> >>> Does someone have any suggestions on the maximum number of Data Centers >>> which NetworkTopology strategy can have for a keyspace. Not only >>> technically but considering performance as well. >>> In each Data Center RF is 3. >>> >>> Thanks! >>> -- >>> Shaurya Gupta >>> >>> >>> > > -- > Shaurya Gupta > > >
Re: Number of DCs in Cassandra
Shaurya: What's the purpose to partise too many data centers ? RF=3, is within a center, you have 3 copies of data. If you have 3 DCs, means 9 copies of data. Think about space wasted, Network bandwidth wasted for number of copies. BTW, Ours just 2 DCs for regional DR. Thanks, Jim On Wed, Jul 14, 2021 at 2:27 AM Shaurya Gupta wrote: > Hi > > Does someone have any suggestions on the maximum number of Data Centers > which NetworkTopology strategy can have for a keyspace. Not only > technically but considering performance as well. > In each Data Center RF is 3. > > Thanks! > -- > Shaurya Gupta > > >
Re: Number of DCs in Cassandra
Hi, Multiple DCs are required to maintain lower latencies for requests across the globe. I agree that it's a lot of redundant copies of data. On Wed, Jul 14, 2021, 7:00 PM Jim Shaw wrote: > Shaurya: > What's the purpose to partise too many data centers ? > RF=3, is within a center, you have 3 copies of data. > If you have 3 DCs, means 9 copies of data. > Think about space wasted, Network bandwidth wasted for number of copies. > BTW, Ours just 2 DCs for regional DR. > > Thanks, > Jim > > On Wed, Jul 14, 2021 at 2:27 AM Shaurya Gupta > wrote: > >> Hi >> >> Does someone have any suggestions on the maximum number of Data Centers >> which NetworkTopology strategy can have for a keyspace. Not only >> technically but considering performance as well. >> In each Data Center RF is 3. >> >> Thanks! >> -- >> Shaurya Gupta >> >> >>
Re: Number of DCs in Cassandra
Hi, So, there's two things where you'll see the impact of "lots of datacenters" On the query side, global quorum queries (and queries with cross-dc probabilistic read repair) may touch more DCs and be slower, and read-repairs during those queries get more expensive. Your geography matters a ton for latency, and your write consistency and network quality matters a ton for read repairs. During the read, the coordinator will track which replicas are mismatching, and build mutations to make them in sync - that buildup will accumulate more data if you're very out of sync. The other thing you should expect is different behavior during repairs. The anti-entropy repairs do pair-wise merkle trees. If you imagine 6, 8, 12 datacenters of 3 copies each, you've got 18, 24, 36 copies of data, each of those holds a merkle tree. The repair coordinator will have a lot more data in memory (adjusting the tree depth in newer versions, or using the offheap option in 4.0) starts removing the GC pressure on the coordinator in those types of topologies. In older versions, using subrange repair and lots of smaller ranges will avoid very deep trees and keep memory tolerable. ALSO, when you do have a mismatch, you're going to stream a LOT of data. Again, in 12x3, if one replica goes down beyond the hint window, when it comes up it's getting 35 copies of data, which is going to overwhelm it when it streams and compacts. CASSANDRA-3200 helps this in 4.0, and incremental repair helps this if you're running incremental repair (again, probably after CASSANDRA-9143 in 4.0), but the naive approach can lead to really bad surprises. On Wed, Jul 14, 2021 at 7:17 AM Shaurya Gupta wrote: > Hi, Multiple DCs are required to maintain lower latencies for requests > across the globe. I agree that it's a lot of redundant copies of data. > > On Wed, Jul 14, 2021, 7:00 PM Jim Shaw wrote: > >> Shaurya: >> What's the purpose to partise too many data centers ? >> RF=3, is within a center, you have 3 copies of data. >> If you have 3 DCs, means 9 copies of data. >> Think about space wasted, Network bandwidth wasted for number of copies. >> BTW, Ours just 2 DCs for regional DR. >> >> Thanks, >> Jim >> >> On Wed, Jul 14, 2021 at 2:27 AM Shaurya Gupta >> wrote: >> >>> Hi >>> >>> Does someone have any suggestions on the maximum number of Data Centers >>> which NetworkTopology strategy can have for a keyspace. Not only >>> technically but considering performance as well. >>> In each Data Center RF is 3. >>> >>> Thanks! >>> -- >>> Shaurya Gupta >>> >>> >>>