Sorry, you’re right. This is what happens when you try to do two things at once. Google too quickly, look like an idiot. Thanks for the correction.
> On Sep 18, 2017, at 1:37 PM, Jeff Jirsa <jji...@gmail.com> wrote: > > For what its worth, the problem isn't the snitch it's the replication > strategy - he's using the right snitch but SimpleStrategy ignores it > > That's the same reason that adding a new DC doesn't work - the relocation > strategy is dc agnostic and changing it safely IS the problem > > > > -- > Jeff Jirsa > > > On Sep 18, 2017, at 11:46 AM, Jon Haddad <jonathan.had...@gmail.com > <mailto:jonathan.had...@gmail.com>> wrote: > >> For those of you who like trivia, simpleSnitch is hard coded to report every >> node in DC in “datacenter1” and in rack “rack1”, there’s no way around it. >> https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/locator/SimpleSnitch.java#L28 >> >> <https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/locator/SimpleSnitch.java#L28> >> >> I would do this by setting up a new DC, trying to do it with the existing >> one is going to leave you in a state where most queries will return >> incorrect results (2/3 of queries at ONE and 1/2 of queries at QUORUM) until >> you finish repair. >> >>> On Sep 18, 2017, at 11:41 AM, Jeff Jirsa <jji...@gmail.com >>> <mailto:jji...@gmail.com>> wrote: >>> >>> The hard part here is nobody's going to be able to tell you exactly what's >>> involved in fixing this because nobody sees your ring >>> >>> And since you're using vnodes and have a nontrivial number of instances, >>> sharing that ring (and doing anything actionable with it) is nontrivial. >>> >>> If you weren't using vnodes, you could just fix the distribution and decom >>> extra nodes afterward. >>> >>> I thought - but don't have time or energy to check - that the ec2snitch >>> would be rack aware even when using simple strategy - if that's not the >>> case (as you seem to indicate), then you're in a weird spot - you can't go >>> to NTS trivially because doing so will reassign your replicas to be rack/as >>> aware, certainly violating your consistency guarantees. >>> >>> If you can change your app to temporarily write with ALL and read with ALL, >>> and then run repair, then immediately ALTER the keyspace, then run repair >>> again, then drop back to whatever consistency you're using, you can >>> probably get through it. The challenge is that ALL gets painful if you lose >>> any instance. >>> >>> But please test in a lab, and note that this is inherently dangerous, I'm >>> not advising you to do it, though I do believe it can be made to work. >>> >>> >>> >>> >>> >>> -- >>> Jeff Jirsa >>> >>> >>> On Sep 18, 2017, at 11:18 AM, Dominik Petrovic >>> <dominik.petro...@mail.ru.INVALID >>> <mailto:dominik.petro...@mail.ru.INVALID>> wrote: >>> >>>> @jeff what do you think is the best approach here to fix this problem? >>>> Thank you all for helping me. >>>> >>>> >>>> Thursday, September 14, 2017 3:28 PM -07:00 from kurt greaves >>>> <k...@instaclustr.com <mailto:k...@instaclustr.com>>: >>>> >>>> Sorry that only applies our you're using NTS. You're right that simple >>>> strategy won't work very well in this case. To migrate you'll likely need >>>> to do a DC migration to ensuite no downtime, as replica placement will >>>> change even if RF stays the same. >>>> >>>> On 15 Sep. 2017 08:26, "kurt greaves" <k...@instaclustr.com >>>> <mailto:k...@instaclustr.com>> wrote: >>>> If you have racks configured and lose nodes you should replace the node >>>> with one from the same rack. You then need to repair, and definitely don't >>>> decommission until you do. >>>> >>>> Also 40 nodes with 256 vnodes is not a fun time for repair. >>>> >>>> On 15 Sep. 2017 03:36, "Dominik Petrovic" <dominik.petro...@mail.ru >>>> <mailto:dominik.petro...@mail.ru>.invalid> wrote: >>>> @jeff, >>>> I'm using 3 availability zones, during the life of the cluster we lost >>>> nodes, retired others and we end up having some of the data >>>> written/replicated on a single availability zone. We saw it with nodetool >>>> getendpoints. >>>> Regards >>>> >>>> >>>> Thursday, September 14, 2017 9:23 AM -07:00 from Jeff Jirsa >>>> <jji...@gmail.com <mailto:jji...@gmail.com>>: >>>> >>>> With one datacenter/region, what did you discover in an outage you think >>>> you'll solve with network topology strategy? It should be equivalent for a >>>> single D.C. >>>> >>>> -- >>>> Jeff Jirsa >>>> >>>> >>>> On Sep 14, 2017, at 8:47 AM, Dominik Petrovic >>>> <dominik.petro...@mail.ru.INVALID >>>> <mailto:dominik.petro...@mail.ru.INVALID>> wrote: >>>> >>>>> Thank you for the replies! >>>>> >>>>> @jeff my current cluster details are: >>>>> 1 datacenter >>>>> 40 nodes, with vnodes=256 >>>>> RF=3 >>>>> What is your advice? is it a production cluster, so I need to be very >>>>> careful about it. >>>>> Regards >>>>> >>>>> >>>>> Thu, 14 Sep 2017 -2:47:52 -0700 from Jeff Jirsa <jji...@gmail.com >>>>> <mailto:jji...@gmail.com>>: >>>>> >>>>> The token distribution isn't going to change - the way Cassandra maps >>>>> replicas will change. >>>>> >>>>> How many data centers/regions will you have when you're done? What's your >>>>> RF now? You definitely need to run repair before you ALTER, but you've >>>>> got a bit of a race here between the repairs and the ALTER, which you MAY >>>>> be able to work around if we know more about your cluster. >>>>> >>>>> How many nodes >>>>> How many regions >>>>> How many replicas per region when you're done? >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Jeff Jirsa >>>>> >>>>> >>>>> On Sep 13, 2017, at 2:04 PM, Dominik Petrovic >>>>> <dominik.petro...@mail.ru.INVALID >>>>> <mailto:dominik.petro...@mail.ru.INVALID>> wrote: >>>>> >>>>>> Dear community, >>>>>> I'd like to receive additional info on how to modify a keyspace >>>>>> replication strategy. >>>>>> >>>>>> My Cassandra cluster is on AWS, Cassandra 2.1.15 using vnodes, the >>>>>> cluster's snitch is configured to Ec2Snitch, but the keyspace the >>>>>> developers created has replication class SimpleStrategy = 3. >>>>>> >>>>>> During an outage last week we realized the discrepancy between the >>>>>> configuration and we would now fix the issue using >>>>>> NetworkTopologyStrategy. >>>>>> >>>>>> What are the suggested steps to perform? >>>>>> For Cassandra 2.1 I found only this doc: >>>>>> http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsChangeKSStrategy.html >>>>>> >>>>>> <http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsChangeKSStrategy.html> >>>>>> >>>>>> that does not mention anything about repairing the cluster >>>>>> >>>>>> For Cassandra 3 I found this other doc: >>>>>> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsChangeKSStrategy.html >>>>>> >>>>>> <https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsChangeKSStrategy.html> >>>>>> >>>>>> That involves also the cluster repair operation. >>>>>> >>>>>> On a test cluster I tried the steps for Cassandra 2.1 but the token >>>>>> distribution in the ring didn't change so I'm assuming that wasn't the >>>>>> right think to do. >>>>>> I also perform a nodetool repair -pr but nothing changed as well. >>>>>> Some advice? >>>>>> >>>>>> -- >>>>>> Dominik Petrovic >>>>> >>>>> >>>>> -- >>>>> Dominik Petrovic >>>> >>>> >>>> -- >>>> Dominik Petrovic >>>> >>>> >>>> -- >>>> Dominik Petrovic >>