Actually, I believe you are seeing the behavior described in the ticket I meant to link to, with the detailed exploration :
https://issues.apache.org/jira/browse/CASSANDRA-10238 =Rob On Wed, Mar 23, 2016 at 2:06 PM, Anubhav Kale <anubhav.k...@microsoft.com> wrote: > Oh, and the query I ran was “select * from racktest.racktable where id=1” > > > > *From:* Anubhav Kale [mailto:anubhav.k...@microsoft.com] > *Sent:* Wednesday, March 23, 2016 2:04 PM > *To:* user@cassandra.apache.org > *Subject:* RE: Rack aware question. > > > > Thanks. > > > > To test what happens when rack of a node changes in a running cluster > without doing a decommission, I did the following. > > > > The cluster looks like below (this was run through Eclipse, therefore the > IP address hack) > > > > *IP* > > 127.0.0.1 > > 127.0.0.2 > > 127.0.0.3 > > *Rack* > > R1 > > R1 > > R2 > > > > A table was created and a row inserted as follows: > > > > Cqlsh 127.0.0.1 > > >create keyspace racktest with replication = { 'class' : > 'NetworkTopologyStrategy', 'datacenter1' : 2 }; > > >create table racktest.racktable(id int, PRIMARY KEY(id)); > > >insert into racktest.racktable(id) values(1); > > > > nodetool getendpoints racktest racktable 1 > > > > 127.0.0.2 > > 127.0.0.3 > > > > Nodetool ring > ring_1.txt (attached) > > > > So far so good. > > > > Then I changed the racks to below and restarted DSE with > –Dcassandra.ignore_rack=true. > > This option from my finding simply avoids the check on startup that > compares the rack in system.local with the one in rack-dc.properties. > > > > *IP* > > 127.0.0.1 > > 127.0.0.2 > > 127.0.0.3 > > *Rack* > > R1 > > R2 > > R1 > > > > nodetool getendpoints racktest racktable 1 > > > > 127.0.0.2 > > 127.0.0.3 > > > > So far so good, cqlsh returns the queries fine. > > > > Nodetool ring > ring_2.txt (attached) > > > > Now comes the interesting part. > > > > I changed the racks to below and restarted DSE. > > > > *IP* > > 127.0.0.1 > > 127.0.0.2 > > 127.0.0.3 > > *Rack* > > R2 > > R1 > > R1 > > > > nodetool getendpoints racktest racktable 1 > > > > 127.0.0.*1* > > 127.0.0.3 > > > > This is *very* interesting, cqlsh returns the queries fine. With tracing > on, it’s clear that the 127.0.0.1 is being asked for data as well. > > > > Nodetool ring > ring_3.txt (attached) > > > > There is no change in token information in ring_* files. The token under > question for id=1 (from select token(id) from racktest.racktable) is > -4069959284402364209. > > > > So, few questions because things don’t add up: > > > > 1. How come 127.0.0.1 is shown as an endpoint holding the ID when its > token range doesn’t contain it ? Does “nodetool ring” shows all > token-ranges for a node or just the primary range ? I am thinking its only > primary. Can someone confirm ? > 2. How come queries contact 127.0.0.1 ? > 3. Is “getendpoints” acting odd here and the data really is on > 127.0.0.2 ? To prove / disprove that, I stopped 127.0.0.2 and ran a query > with CONSISTENCY ALL, and it came back just fine meaning 127.0.0.1 indeed > hold the data (SS Tables also show it). > 4. So, does this mean that the data actually gets moved around when > racks change ? > > > > Thanks ! > > > > > > *From:* Robert Coli [mailto:rc...@eventbrite.com <rc...@eventbrite.com>] > *Sent:* Wednesday, March 23, 2016 11:59 AM > *To:* user@cassandra.apache.org > *Subject:* Re: Rack aware question. > > > > On Wed, Mar 23, 2016 at 8:07 AM, Anubhav Kale <anubhav.k...@microsoft.com> > wrote: > > Suppose we change the racks on VMs on a running cluster. (We need to do > this while running on Azure, because sometimes when the VM gets moved its > rack changes). > > > > In this situation, new writes will be laid out based on new rack info on > appropriate replicas. What happens for existing data ? Is that data moved > around as well and does it happen if we run repair or on its own ? > > > > First, you should understand this ticket if relying on rack awareness : > > > > https://issues.apache.org/jira/browse/CASSANDRA-3810 > <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fissues.apache.org%2fjira%2fbrowse%2fCASSANDRA-3810&data=01%7c01%7cAnubhav.Kale%40microsoft.com%7c7aeaaa44f712480a8e7608d3534d3485%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=PIEK5w9ZycRYTymQXBCQOHQ9a1BuurGDFc6J3C%2fWvwQ%3d> > > > > Second, in general nodes cannot move between racks. > > > > https://issues.apache.org/jira/browse/CASSANDRA-10242 > <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fissues.apache.org%2fjira%2fbrowse%2fCASSANDRA-10242&data=01%7c01%7cAnubhav.Kale%40microsoft.com%7c7aeaaa44f712480a8e7608d3534d3485%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=nHX51ahp3SyGKouKb2WFtYmMQSjSNVzH%2fzvN%2fNPJzPw%3d> > > > > Has some detailed explanations of what blows up if they do. > > > > Note that if you want to preserve any of the data on the node, you need to > : > > > > 1) bring it and have it join the ring in its new rack (during which time > it will serve incorrect reads due to missing data) > > 2) stop it > > 3) run cleanup > > 4) run repair > > 5) start it again > > > > Can't really say that I recommend this practice, but it's better than > "rebootstrap it" which is the official advice. If you "rebootstrap it" you > decrease unique replica count by 1, which has a nonzero chance of > data-loss. The Coli Conjecture says that in practice you probably don't > care about this nonzero chance of data loss if you are running your > application in CL.ONE, which should be all cases where it matters. > > > > =Rob > > >