Using CL:ALL basically forces you to always include the first replica in the query.
The first replica will be the same for both SimpleStrategy/SimpleSnitch and NetworkTopologyStrategy/EC2Snitch. It's basically the only way we can guarantee we're not going to lose a row because it's only written to the second and third replicas while the first replica is down, in case the second and third replicas change to different hosts (racks / availability zones) during the ALTER. On Mon, Sep 18, 2017 at 1:57 PM, Myron A. Semack <msem...@kcftech.com> wrote: > How would setting the consistency to ALL help? Wouldn’t that just cause > EVERY read/write to fail after the ALTER until the repair is complete? > > > > Sincerely, > > Myron A. Semack > > > > *From:* Jeff Jirsa [mailto:jji...@gmail.com] > *Sent:* Monday, September 18, 2017 2:42 PM > *To:* user@cassandra.apache.org > *Subject:* Re: Re[6]: Modify keyspace replication strategy and rebalance > the nodes > > > > The hard part here is nobody's going to be able to tell you exactly what's > involved in fixing this because nobody sees your ring > > > > And since you're using vnodes and have a nontrivial number of instances, > sharing that ring (and doing anything actionable with it) is nontrivial. > > > > If you weren't using vnodes, you could just fix the distribution and decom > extra nodes afterward. > > > > I thought - but don't have time or energy to check - that the ec2snitch > would be rack aware even when using simple strategy - if that's not the > case (as you seem to indicate), then you're in a weird spot - you can't go > to NTS trivially because doing so will reassign your replicas to be rack/as > aware, certainly violating your consistency guarantees. > > > > If you can change your app to temporarily write with ALL and read with > ALL, and then run repair, then immediately ALTER the keyspace, then run > repair again, then drop back to whatever consistency you're using, you can > probably get through it. The challenge is that ALL gets painful if you lose > any instance. > > > > But please test in a lab, and note that this is inherently dangerous, I'm > not advising you to do it, though I do believe it can be made to work. > > > > > > > > > > > -- > > Jeff Jirsa > > > > > On Sep 18, 2017, at 11:18 AM, Dominik Petrovic <dominik.petro...@mail.ru. > INVALID> wrote: > > @jeff what do you think is the best approach here to fix this problem? > Thank you all for helping me. > > Thursday, September 14, 2017 3:28 PM -07:00 from kurt greaves < > k...@instaclustr.com>: > > Sorry that only applies our you're using NTS. You're right that simple > strategy won't work very well in this case. To migrate you'll likely need > to do a DC migration to ensuite no downtime, as replica placement will > change even if RF stays the same. > > > > On 15 Sep. 2017 08:26, "kurt greaves" <k...@instaclustr.com> wrote: > > If you have racks configured and lose nodes you should replace the node > with one from the same rack. You then need to repair, and definitely don't > decommission until you do. > > > > Also 40 nodes with 256 vnodes is not a fun time for repair. > > > > On 15 Sep. 2017 03:36, "Dominik Petrovic" <dominik.petro...@mail.ru.invalid> > wrote: > > @jeff, > I'm using 3 availability zones, during the life of the cluster we lost > nodes, retired others and we end up having some of the data > written/replicated on a single availability zone. We saw it with nodetool > getendpoints. > Regards > > Thursday, September 14, 2017 9:23 AM -07:00 from Jeff Jirsa < > jji...@gmail.com>: > > With one datacenter/region, what did you discover in an outage you think > you'll solve with network topology strategy? It should be equivalent for a > single D.C. > > -- > > Jeff Jirsa > > > > > On Sep 14, 2017, at 8:47 AM, Dominik Petrovic <dominik.petro...@mail.ru. > INVALID> wrote: > > Thank you for the replies! > > @jeff my current cluster details are: > 1 datacenter > 40 nodes, with vnodes=256 > RF=3 > What is your advice? is it a production cluster, so I need to be very > careful about it. > Regards > > Thu, 14 Sep 2017 -2:47:52 -0700 from Jeff Jirsa <jji...@gmail.com>: > > The token distribution isn't going to change - the way Cassandra maps > replicas will change. > > > > How many data centers/regions will you have when you're done? What's your > RF now? You definitely need to run repair before you ALTER, but you've got > a bit of a race here between the repairs and the ALTER, which you MAY be > able to work around if we know more about your cluster. > > > > How many nodes > > How many regions > > How many replicas per region when you're done? > > > > > > > > -- > > Jeff Jirsa > > > > > On Sep 13, 2017, at 2:04 PM, Dominik Petrovic <dominik.petro...@mail.ru. > INVALID> wrote: > > Dear community, > I'd like to receive additional info on how to modify a keyspace > replication strategy. > > My Cassandra cluster is on AWS, Cassandra 2.1.15 using vnodes, the > cluster's snitch is configured to Ec2Snitch, but the keyspace the > developers created has replication class SimpleStrategy = 3. > > During an outage last week we realized the discrepancy between the > configuration and we would now fix the issue using NetworkTopologyStrategy. > > What are the suggested steps to perform? > For Cassandra 2.1 I found only this doc: http://docs.datastax.com/ > en/cassandra/2.1/cassandra/operations/opsChangeKSStrategy.html > that does not mention anything about repairing the cluster > > For Cassandra 3 I found this other doc: https://docs.datastax. > com/en/cassandra/3.0/cassandra/operations/opsChangeKSStrategy.html > That involves also the cluster repair operation. > > On a test cluster I tried the steps for Cassandra 2.1 but the token > distribution in the ring didn't change so I'm assuming that wasn't the > right think to do. > I also perform a nodetool repair -pr but nothing changed as well. > Some advice? > > -- > Dominik Petrovic > > > > -- > Dominik Petrovic > > > > -- > Dominik Petrovic > > > > -- > Dominik Petrovic > >