Thanks, Jon! I just added the AZ for each rack on the right column. However thanks for your reply and clarification. Maybe I should have marked the rack names with RACK-READ and RACK-WRITE to avoid confusion and not use ONE and TWO.
What's more, fault-tolerant between with RF=3: A) spread each DC across 3 AZ B) assign to each DC a separate AZ I assume that I should adjust the consistency level accordingly in case of failures: If I have 3 nodes and 1 goes down with RF = 3 and LOCAL_QUORUM consistency I should downgrade to LOCAL_ONE if I want to keep serving traffic for reads. Best, Sergio Il giorno mer 23 ott 2019 alle ore 14:12 Jon Haddad <j...@jonhaddad.com> ha scritto: > Oh, my bad. There was a flood of information there, I didn't realize you > had switched to two DCs. It's been a long day. > > I'll be honest, it's really hard to read your various options as you've > intermixed terminology from AWS and Cassandra in a weird way and there's > several pages of information here to go through. I don't have time to > decipher it, sorry. > > Spread a DC across 3 AZs if you want to be fault tolerant and will use > RF=3, use a single AZ if you don't care about full DC failure in the case > of an AZ failure or you're not using RF=3. > > > On Wed, Oct 23, 2019 at 4:56 PM Sergio <lapostadiser...@gmail.com> wrote: > >> OPTION C or OPTION A? >> >> Which one are you referring to? >> >> Both have separate DCs to keep the workload separate. >> >> - OPTION A) >> - Node DC RACK AZ 1 read ONE us-east-1a 2 read ONE us-east-1a >> - 3 read ONE us-east-1a >> - 4 write TWO us-east-1b 5 write TWO us-east-1b >> - 6 write TWO us-east-1b >> >> >> Here we have 2 DC read and write >> One Rack per DC >> One Availability Zone per DC >> >> Thanks, >> >> Sergio >> >> >> On Wed, Oct 23, 2019, 1:11 PM Jon Haddad <j...@jonhaddad.com> wrote: >> >>> Personally, I wouldn't ever do this. I recommend separate DCs if you >>> want to keep workloads separate. >>> >>> On Wed, Oct 23, 2019 at 4:06 PM Sergio <lapostadiser...@gmail.com> >>> wrote: >>> >>>> I forgot to comment for >>>> >>>> OPTION C) >>>> 1. Node DC RACK AZ 1 read ONE us-east-1a 2 read ONE us-east-1b >>>> 2. 3 read ONE us-east-1c >>>> 3. 4 write TWO us-east-1a 5 write TWO us-east-1b >>>> 4. 6 write TWO us-east-1c I would expect that I need to decrease >>>> the Consistency Level in the reads if one of the AZ goes down. Please >>>> consider the below one as the real OPTION A. The previous one looks to >>>> be >>>> wrong because the same rack is assigned to 2 different DC. >>>> 5. OPTION A) >>>> 6. Node DC RACK AZ 1 read ONE us-east-1a 2 read ONE us-east-1a >>>> 7. 3 read ONE us-east-1a >>>> 8. 4 write TWO us-east-1b 5 write TWO us-east-1b >>>> 9. 6 write TWO us-east-1b >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Sergio >>>> >>>> Il giorno mer 23 ott 2019 alle ore 12:33 Sergio < >>>> lapostadiser...@gmail.com> ha scritto: >>>> >>>>> Hi Reid, >>>>> >>>>> Thank you very much for clearing these concepts for me. >>>>> https://community.datastax.com/comments/1133/view.html I posted this >>>>> question on the datastax forum regarding our cluster that it is unbalanced >>>>> and the reply was related that the *number of racks should be a >>>>> multiplier of the replication factor *in order to be balanced or 1. I >>>>> thought then if I have 3 availability zones I should have 3 racks for each >>>>> datacenter and not 2 (us-east-1b, us-east-1a) as I have right now or in >>>>> the >>>>> easiest way, I should have a rack for each datacenter. >>>>> >>>>> >>>>> >>>>> 1. Datacenter: live >>>>> ================ >>>>> Status=Up/Down >>>>> |/ State=Normal/Leaving/Joining/Moving >>>>> -- Address Load Tokens Owns Host ID >>>>> Rack >>>>> UN 10.1.20.49 289.75 GiB 256 ? >>>>> be5a0193-56e7-4d42-8cc8-5d2141ab4872 us-east-1a >>>>> UN 10.1.30.112 103.03 GiB 256 ? >>>>> e5108a8e-cc2f-4914-a86e-fccf770e3f0f us-east-1b >>>>> UN 10.1.19.163 129.61 GiB 256 ? >>>>> 3c2efdda-8dd4-4f08-b991-9aff062a5388 us-east-1a >>>>> UN 10.1.26.181 145.28 GiB 256 ? >>>>> 0a8f07ba-a129-42b0-b73a-df649bd076ef us-east-1b >>>>> UN 10.1.17.213 149.04 GiB 256 ? >>>>> 71563e86-b2ae-4d2c-91c5-49aa08386f67 us-east-1a >>>>> DN 10.1.19.198 52.41 GiB 256 ? >>>>> 613b43c0-0688-4b86-994c-dc772b6fb8d2 us-east-1b >>>>> UN 10.1.31.60 195.17 GiB 256 ? >>>>> 3647fcca-688a-4851-ab15-df36819910f4 us-east-1b >>>>> UN 10.1.25.206 100.67 GiB 256 ? >>>>> f43532ad-7d2e-4480-a9ce-2529b47f823d us-east-1b >>>>> So each rack label right now matches the availability zone and we >>>>> have 3 Datacenters and 2 Availability Zone with 2 racks per DC but the >>>>> above is clearly unbalanced >>>>> If I have a keyspace with a replication factor = 3 and I want to >>>>> minimize the number of nodes to scale up and down the cluster and keep >>>>> it >>>>> balanced should I consider an approach like OPTION A) >>>>> 2. Node DC RACK AZ 1 read ONE us-east-1a 2 read ONE us-east-1a >>>>> 3. 3 read ONE us-east-1a >>>>> 4. 4 write ONE us-east-1b 5 write ONE us-east-1b >>>>> 5. 6 write ONE us-east-1b >>>>> 6. OPTION B) >>>>> 7. Node DC RACK AZ 1 read ONE us-east-1a 2 read ONE us-east-1a >>>>> 8. 3 read ONE us-east-1a >>>>> 9. 4 write TWO us-east-1b 5 write TWO us-east-1b >>>>> 10. 6 write TWO us-east-1b >>>>> 11. *7 read ONE us-east-1c 8 write TWO us-east-1c* >>>>> 12. *9 read ONE us-east-1c* Option B looks to be unbalanced and I >>>>> would exclude it OPTION C) >>>>> 13. Node DC RACK AZ 1 read ONE us-east-1a 2 read ONE us-east-1b >>>>> 14. 3 read ONE us-east-1c >>>>> 15. 4 write TWO us-east-1a 5 write TWO us-east-1b >>>>> 16. 6 write TWO us-east-1c >>>>> 17. >>>>> >>>>> >>>>> so I am thinking of A if I have the restriction of 2 AZ but I >>>>> guess that option C would be the best. If I have to add another DC for >>>>> reads because we want to assign a new DC for each new microservice it >>>>> would >>>>> look like: >>>>> OPTION EXTRA DC For Reads >>>>> 1. Node DC RACK AZ 1 read ONE us-east-1a 2 read ONE us-east-1b >>>>> 2. 3 read ONE us-east-1c >>>>> 3. 4 write TWO us-east-1a 5 write TWO us-east-1b >>>>> 4. 6 write TWO us-east-1c 7 extra-read THREE us-east-1a >>>>> 5. 8 extra-read THREE us-east-1b >>>>> 6. >>>>> 7. >>>>> >>>>> >>>>> 1. 9 extra-read THREE us-east-1c >>>>> 2. >>>>> The DC for *write* will replicate the data in the other >>>>> datacenters. My scope is to keep the *read* machines dedicated to >>>>> serve reads and *write* machines to serve writes. Cassandra will >>>>> handle the replication for me. Is there any other option that is I >>>>> missing >>>>> or wrong assumption? I am thinking that I will write a blog post about >>>>> all >>>>> my learnings so far, thank you very much for the replies Best, Sergio >>>>> >>>>> >>>>> Il giorno mer 23 ott 2019 alle ore 10:57 Reid Pinchback < >>>>> rpinchb...@tripadvisor.com> ha scritto: >>>>> >>>>>> No, that’s not correct. The point of racks is to help you distribute >>>>>> the replicas, not further-replicate the replicas. Data centers are what >>>>>> do >>>>>> the latter. So for example, if you wanted to be able to ensure that you >>>>>> always had quorum if an AZ went down, then you could have two DCs where >>>>>> one >>>>>> was in each AZ, and use one rack in each DC. In your situation I think >>>>>> I’d >>>>>> be more tempted to consider that. Then if an AZ went away, you could >>>>>> fail >>>>>> over your traffic to the remaining DC and still be perfectly fine. >>>>>> >>>>>> >>>>>> >>>>>> For background on replicas vs racks, I believe the information you >>>>>> want is under the heading ‘NetworkTopologyStrategy’ at: >>>>>> >>>>>> http://cassandra.apache.org/doc/latest/architecture/dynamo.html >>>>>> >>>>>> >>>>>> >>>>>> That should help you better understand how replicas distribute. >>>>>> >>>>>> >>>>>> >>>>>> As mentioned before, while you can choose to do the reads in one DC, >>>>>> except for concerns about contention related to network traffic and >>>>>> connection handling, you can’t isolate reads from writes. You can _ >>>>>> *mostly*_ insulate the write DC from the activity within the read >>>>>> DC, and even that isn’t an absolute because of repairs. However, your >>>>>> mileage may vary, so do what makes sense for your usage pattern. >>>>>> >>>>>> >>>>>> >>>>>> R >>>>>> >>>>>> >>>>>> >>>>>> *From: *Sergio <lapostadiser...@gmail.com> >>>>>> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >>>>>> *Date: *Wednesday, October 23, 2019 at 12:50 PM >>>>>> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >>>>>> *Subject: *Re: Cassandra Rack - Datacenter Load Balancing relations >>>>>> >>>>>> >>>>>> >>>>>> *Message from External Sender* >>>>>> >>>>>> Hi Reid, >>>>>> >>>>>> Thanks for your reply. I really appreciate your explanation. >>>>>> >>>>>> We are in AWS and we are using right now 2 Availability Zone and not >>>>>> 3. We found our cluster really unbalanced because the keyspace has a >>>>>> replication factor = 3 and the number of racks is 2 with 2 datacenters. >>>>>> We want the writes spread across all the nodes but we wanted the >>>>>> reads isolated from the writes to keep the load on that node low and to >>>>>> be >>>>>> able to identify problems in the consumers (reads) or producers (writes) >>>>>> applications. >>>>>> It looks like that each rack contains an entire copy of the data so >>>>>> this would lead to replicate for each rack and then for each node the >>>>>> information. If I am correct if we have a keyspace with 100GB and >>>>>> Replication Factor = 3 and RACKS = 3 => 100 * 3 * 3 = 900GB >>>>>> If I had only one rack across 2 or even 3 availability zone I would >>>>>> save in space and I would have 300GB only. Please correct me if I am >>>>>> wrong. >>>>>> >>>>>> Best, >>>>>> >>>>>> Sergio >>>>>> >>>>>> >>>>>> >>>>>> Il giorno mer 23 ott 2019 alle ore 09:21 Reid Pinchback < >>>>>> rpinchb...@tripadvisor.com> ha scritto: >>>>>> >>>>>> Datacenters and racks are different concepts. While they don't have >>>>>> to be associated with their historical meanings, the historical meanings >>>>>> probably provide a helpful model for understanding what you want from >>>>>> them. >>>>>> >>>>>> When companies own their own physical servers and have them housed >>>>>> somewhere, the questions arise on where you want to locate any particular >>>>>> server. It's a balancing act on things like network speed of related >>>>>> servers being able to talk to each other, versus fault-tolerance of >>>>>> having >>>>>> many servers not all exposed to the same risks. >>>>>> >>>>>> "Same rack" in that physical world tended to mean something like "all >>>>>> behind the same network switch and all sharing the same power bus". The >>>>>> morning after an electrical glitch fries a power bus and thus everything >>>>>> in >>>>>> that rack, you realize you wished you didn't have so many of the same >>>>>> type >>>>>> of server together. Well, they were servers. Now they are door stops. >>>>>> Badness and sadness. >>>>>> >>>>>> That's kind of the mindset to have in mind with racks in Cassandra. >>>>>> It's an artifact for you to separate servers into pools so that the >>>>>> disparate pools have hopefully somewhat independent infrastructure risks. >>>>>> However, all those servers are still doing the same kind of work, are the >>>>>> same version, etc. >>>>>> >>>>>> Datacenters are amalgams of those racks, and how similar or different >>>>>> they are from each other depends on what you want to do with them. What >>>>>> is >>>>>> true is that if you have N datacenters, each one of them must have enough >>>>>> disk storage to house all the data. The actual physical footprint of >>>>>> that >>>>>> data in each DC depends on the replication factors in play. >>>>>> >>>>>> Note that you sorta can't have "one datacenter for writes" because >>>>>> the writes will replicate across the data centers. You could definitely >>>>>> choose to have only one that takes read queries, but best to think of >>>>>> writing as being universal. One scenario you can have is where the DC >>>>>> not >>>>>> taking live traffic read queries is the one you use for maintenance or >>>>>> performance testing or version upgrades. >>>>>> >>>>>> One rack makes your life easier if you don't have a reason for >>>>>> multiple racks. It depends on the environment you deploy into and your >>>>>> fault tolerance goals. If you were in AWS and wanting to spread risk >>>>>> across availability zones, then you would likely have as many racks as >>>>>> AZs >>>>>> you choose to be in, because that's really the point of using multiple >>>>>> AZs. >>>>>> >>>>>> R >>>>>> >>>>>> >>>>>> On 10/23/19, 4:06 AM, "Sergio Bilello" <lapostadiser...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> Message from External Sender >>>>>> >>>>>> Hello guys! >>>>>> >>>>>> I was reading about >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cassandra.apache.org_doc_latest_architecture_dynamo.html-23networktopologystrategy&d=DwIBaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=xmgs1uQTlmvCtIoGJKHbByZZ6aDFzS5hDQzChDPCfFA&s=9ZDWAK6pstkCQfdbwLNsB-ZGsK64RwXSXfAkOWtmkq4&e= >>>>>> >>>>>> I would like to understand a concept related to the node load >>>>>> balancing. >>>>>> >>>>>> I know that Jon recommends Vnodes = 4 but right now I found a >>>>>> cluster with vnodes = 256 replication factor = 3 and 2 racks. This is >>>>>> unbalanced because the racks are not a multiplier of the replication >>>>>> factor. >>>>>> >>>>>> However, my plan is to move all the nodes in a single rack to >>>>>> eventually scale up and down the node in the cluster once at the time. >>>>>> >>>>>> If I had 3 racks and I would like to keep the things balanced I >>>>>> should scale up 3 nodes at the time one for each rack. >>>>>> >>>>>> If I would have 3 racks, should I have also 3 different >>>>>> datacenters so one datacenter for each rack? >>>>>> >>>>>> Can I have 2 datacenters and 3 racks? If this is possible one >>>>>> datacenter would have more nodes than the others? Could it be a problem? >>>>>> >>>>>> I am thinking to split my cluster in one datacenter for reads and >>>>>> one for writes and keep all the nodes in the same rack so I can scale up >>>>>> once node at the time. >>>>>> >>>>>> >>>>>> >>>>>> Please correct me if I am wrong >>>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> >>>>>> >>>>>> Sergio >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> >>>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >>>>>> >>>>>> For additional commands, e-mail: user-h...@cassandra.apache.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>