Sorry, you’re right.  This is what happens when you try to do two things at 
once.  Google too quickly, look like an idiot.  Thanks for the correction.


> On Sep 18, 2017, at 1:37 PM, Jeff Jirsa <jji...@gmail.com> wrote:
> 
> For what its worth, the problem isn't the snitch it's the replication 
> strategy - he's using the right snitch but SimpleStrategy ignores it
> 
> That's the same reason that adding a new DC doesn't work - the relocation 
> strategy is dc agnostic and changing it safely IS the problem
> 
> 
> 
> -- 
> Jeff Jirsa
> 
> 
> On Sep 18, 2017, at 11:46 AM, Jon Haddad <jonathan.had...@gmail.com 
> <mailto:jonathan.had...@gmail.com>> wrote:
> 
>> For those of you who like trivia, simpleSnitch is hard coded to report every 
>> node in DC in “datacenter1” and in rack “rack1”, there’s no way around it.  
>> https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/locator/SimpleSnitch.java#L28
>>  
>> <https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/locator/SimpleSnitch.java#L28>
>> 
>> I would do this by setting up a new DC, trying to do it with the existing 
>> one is going to leave you in a state where most queries will return 
>> incorrect results (2/3 of queries at ONE and 1/2 of queries at QUORUM) until 
>> you finish repair.
>> 
>>> On Sep 18, 2017, at 11:41 AM, Jeff Jirsa <jji...@gmail.com 
>>> <mailto:jji...@gmail.com>> wrote:
>>> 
>>> The hard part here is nobody's going to be able to tell you exactly what's 
>>> involved in fixing this because nobody sees your ring
>>> 
>>> And since you're using vnodes and have a nontrivial number of instances, 
>>> sharing that ring (and doing anything actionable with it) is nontrivial. 
>>> 
>>> If you weren't using vnodes, you could just fix the distribution and decom 
>>> extra nodes afterward. 
>>> 
>>> I thought - but don't have time or energy to check - that the ec2snitch 
>>> would be rack aware even when using simple strategy - if that's not the 
>>> case (as you seem to indicate), then you're in a weird spot - you can't go 
>>> to NTS trivially because doing so will reassign your replicas to be rack/as 
>>> aware, certainly violating your consistency guarantees.
>>> 
>>> If you can change your app to temporarily write with ALL and read with ALL, 
>>> and then run repair, then immediately ALTER the keyspace, then run repair 
>>> again, then drop back to whatever consistency you're using, you can 
>>> probably get through it. The challenge is that ALL gets painful if you lose 
>>> any instance.
>>> 
>>> But please test in a lab, and note that this is inherently dangerous, I'm 
>>> not advising you to do it, though I do believe it can be made to work.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Jeff Jirsa
>>> 
>>> 
>>> On Sep 18, 2017, at 11:18 AM, Dominik Petrovic 
>>> <dominik.petro...@mail.ru.INVALID 
>>> <mailto:dominik.petro...@mail.ru.INVALID>> wrote:
>>> 
>>>> @jeff what do you think is the best approach here to fix this problem?
>>>> Thank you all for helping me.
>>>> 
>>>> 
>>>> Thursday, September 14, 2017 3:28 PM -07:00 from kurt greaves 
>>>> <k...@instaclustr.com <mailto:k...@instaclustr.com>>:
>>>> 
>>>> Sorry that only applies our you're using NTS. You're right that simple 
>>>> strategy won't work very well in this case. To migrate you'll likely need 
>>>> to do a DC migration to ensuite no downtime, as replica placement will 
>>>> change even if RF stays the same.
>>>> 
>>>> On 15 Sep. 2017 08:26, "kurt greaves" <k...@instaclustr.com 
>>>> <mailto:k...@instaclustr.com>> wrote:
>>>> If you have racks configured and lose nodes you should replace the node 
>>>> with one from the same rack. You then need to repair, and definitely don't 
>>>> decommission until you do.
>>>> 
>>>> Also 40 nodes with 256 vnodes is not a fun time for repair.
>>>> 
>>>> On 15 Sep. 2017 03:36, "Dominik Petrovic" <dominik.petro...@mail.ru 
>>>> <mailto:dominik.petro...@mail.ru>.invalid> wrote:
>>>> @jeff,
>>>> I'm using 3 availability zones, during the life of the cluster we lost 
>>>> nodes, retired others and we end up having some of the data 
>>>> written/replicated on a single availability zone. We saw it with nodetool 
>>>> getendpoints.
>>>> Regards 
>>>> 
>>>> 
>>>> Thursday, September 14, 2017 9:23 AM -07:00 from Jeff Jirsa 
>>>> <jji...@gmail.com <mailto:jji...@gmail.com>>:
>>>> 
>>>> With one datacenter/region, what did you discover in an outage you think 
>>>> you'll solve with network topology strategy? It should be equivalent for a 
>>>> single D.C. 
>>>> 
>>>> -- 
>>>> Jeff Jirsa
>>>> 
>>>> 
>>>> On Sep 14, 2017, at 8:47 AM, Dominik Petrovic 
>>>> <dominik.petro...@mail.ru.INVALID 
>>>> <mailto:dominik.petro...@mail.ru.INVALID>> wrote:
>>>> 
>>>>> Thank you for the replies!
>>>>> 
>>>>> @jeff my current cluster details are:
>>>>> 1 datacenter
>>>>> 40 nodes, with vnodes=256
>>>>> RF=3
>>>>> What is your advice? is it a production cluster, so I need to be very 
>>>>> careful about it.
>>>>> Regards
>>>>> 
>>>>> 
>>>>> Thu, 14 Sep 2017 -2:47:52 -0700 from Jeff Jirsa <jji...@gmail.com 
>>>>> <mailto:jji...@gmail.com>>:
>>>>> 
>>>>> The token distribution isn't going to change - the way Cassandra maps 
>>>>> replicas will change. 
>>>>> 
>>>>> How many data centers/regions will you have when you're done? What's your 
>>>>> RF now? You definitely need to run repair before you ALTER, but you've 
>>>>> got a bit of a race here between the repairs and the ALTER, which you MAY 
>>>>> be able to work around if we know more about your cluster.
>>>>> 
>>>>> How many nodes
>>>>> How many regions
>>>>> How many replicas per region when you're done?
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Jeff Jirsa
>>>>> 
>>>>> 
>>>>> On Sep 13, 2017, at 2:04 PM, Dominik Petrovic 
>>>>> <dominik.petro...@mail.ru.INVALID 
>>>>> <mailto:dominik.petro...@mail.ru.INVALID>> wrote:
>>>>> 
>>>>>> Dear community,
>>>>>> I'd like to receive additional info on how to modify a keyspace 
>>>>>> replication strategy.
>>>>>> 
>>>>>> My Cassandra cluster is on AWS, Cassandra 2.1.15 using vnodes, the 
>>>>>> cluster's snitch is configured to Ec2Snitch, but the keyspace the 
>>>>>> developers created has replication class SimpleStrategy = 3.
>>>>>> 
>>>>>> During an outage last week we realized the discrepancy between the 
>>>>>> configuration and we would now fix the issue using 
>>>>>> NetworkTopologyStrategy. 
>>>>>> 
>>>>>> What are the suggested steps to perform?
>>>>>> For Cassandra 2.1 I found only this doc: 
>>>>>> http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsChangeKSStrategy.html
>>>>>>  
>>>>>> <http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsChangeKSStrategy.html>
>>>>>>  
>>>>>> that does not mention anything about repairing the cluster
>>>>>> 
>>>>>> For Cassandra 3 I found this other doc: 
>>>>>> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsChangeKSStrategy.html
>>>>>>  
>>>>>> <https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsChangeKSStrategy.html>
>>>>>>  
>>>>>> That involves also the cluster repair operation.
>>>>>> 
>>>>>> On a test cluster I tried the steps for Cassandra 2.1 but the token 
>>>>>> distribution in the ring didn't change so I'm assuming that wasn't the 
>>>>>> right think to do.
>>>>>> I also perform a nodetool repair -pr but nothing changed as well.
>>>>>> Some advice?
>>>>>> 
>>>>>> -- 
>>>>>> Dominik Petrovic
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Dominik Petrovic
>>>> 
>>>> 
>>>> -- 
>>>> Dominik Petrovic
>>>> 
>>>> 
>>>> -- 
>>>> Dominik Petrovic
>> 

Reply via email to