Re: Rack aware question.

Robert Coli Wed, 23 Mar 2016 14:54:12 -0700

Actually, I believe you are seeing the behavior described in the ticket I
meant to link to, with the detailed exploration :


https://issues.apache.org/jira/browse/CASSANDRA-10238

=Rob


On Wed, Mar 23, 2016 at 2:06 PM, Anubhav Kale <anubhav.k...@microsoft.com>
wrote:

> Oh, and the query I ran was “select * from racktest.racktable where id=1”
>
>
>
> *From:* Anubhav Kale [mailto:anubhav.k...@microsoft.com]
> *Sent:* Wednesday, March 23, 2016 2:04 PM
> *To:* user@cassandra.apache.org
> *Subject:* RE: Rack aware question.
>
>
>
> Thanks.
>
>
>
> To test what happens when rack of a node changes in a running cluster
> without doing a decommission, I did the following.
>
>
>
> The cluster looks like below (this was run through Eclipse, therefore the
> IP address hack)
>
>
>
> *IP*
>
> 127.0.0.1
>
> 127.0.0.2
>
> 127.0.0.3
>
> *Rack*
>
> R1
>
> R1
>
> R2
>
>
>
> A table was created and a row inserted as follows:
>
>
>
> Cqlsh 127.0.0.1
>
> >create keyspace racktest with replication = { 'class' :
> 'NetworkTopologyStrategy', 'datacenter1' : 2 };
>
> >create table racktest.racktable(id int, PRIMARY KEY(id));
>
> >insert into racktest.racktable(id) values(1);
>
>
>
> nodetool getendpoints racktest racktable 1
>
>
>
> 127.0.0.2
>
> 127.0.0.3
>
>
>
> Nodetool ring > ring_1.txt (attached)
>
>
>
> So far so good.
>
>
>
> Then I changed the racks to below and restarted DSE with
> –Dcassandra.ignore_rack=true.
>
> This option from my finding simply avoids the check on startup that
> compares the rack in system.local with the one in rack-dc.properties.
>
>
>
> *IP*
>
> 127.0.0.1
>
> 127.0.0.2
>
> 127.0.0.3
>
> *Rack*
>
> R1
>
> R2
>
> R1
>
>
>
> nodetool getendpoints racktest racktable 1
>
>
>
> 127.0.0.2
>
> 127.0.0.3
>
>
>
> So far so good, cqlsh returns the queries fine.
>
>
>
> Nodetool ring > ring_2.txt (attached)
>
>
>
> Now comes the interesting part.
>
>
>
> I changed the racks to below and restarted DSE.
>
>
>
> *IP*
>
> 127.0.0.1
>
> 127.0.0.2
>
> 127.0.0.3
>
> *Rack*
>
> R2
>
> R1
>
> R1
>
>
>
> nodetool getendpoints racktest racktable 1
>
>
>
> 127.0.0.*1*
>
> 127.0.0.3
>
>
>
> This is *very* interesting, cqlsh returns the queries fine. With tracing
> on, it’s clear that the 127.0.0.1 is being asked for data as well.
>
>
>
> Nodetool ring > ring_3.txt (attached)
>
>
>
> There is no change in token information in ring_* files. The token under
> question for id=1 (from select token(id) from racktest.racktable) is
> -4069959284402364209.
>
>
>
> So, few questions because things don’t add up:
>
>
>
>    1. How come 127.0.0.1 is shown as an endpoint holding the ID when its
>    token range doesn’t contain it ? Does “nodetool ring” shows all
>    token-ranges for a node or just the primary range ? I am thinking its only
>    primary. Can someone confirm ?
>    2. How come queries contact 127.0.0.1 ?
>    3. Is “getendpoints” acting odd here and the data really is on
>    127.0.0.2 ? To prove / disprove that, I stopped 127.0.0.2 and ran a query
>    with CONSISTENCY ALL, and it came back just fine meaning 127.0.0.1 indeed
>    hold the data (SS Tables also show it).
>    4. So, does this mean that the data actually gets moved around when
>    racks change ?
>
>
>
> Thanks !
>
>
>
>
>
> *From:* Robert Coli [mailto:rc...@eventbrite.com <rc...@eventbrite.com>]
> *Sent:* Wednesday, March 23, 2016 11:59 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Rack aware question.
>
>
>
> On Wed, Mar 23, 2016 at 8:07 AM, Anubhav Kale <anubhav.k...@microsoft.com>
> wrote:
>
> Suppose we change the racks on VMs on a running cluster. (We need to do
> this while running on Azure, because sometimes when the VM gets moved its
> rack changes).
>
>
>
> In this situation, new writes will be laid out based on new rack info on
> appropriate replicas. What happens for existing data ? Is that data moved
> around as well and does it happen if we run repair or on its own ?
>
>
>
> First, you should understand this ticket if relying on rack awareness :
>
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-3810
> <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fissues.apache.org%2fjira%2fbrowse%2fCASSANDRA-3810&data=01%7c01%7cAnubhav.Kale%40microsoft.com%7c7aeaaa44f712480a8e7608d3534d3485%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=PIEK5w9ZycRYTymQXBCQOHQ9a1BuurGDFc6J3C%2fWvwQ%3d>
>
>
>
> Second, in general nodes cannot move between racks.
>
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-10242
> <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fissues.apache.org%2fjira%2fbrowse%2fCASSANDRA-10242&data=01%7c01%7cAnubhav.Kale%40microsoft.com%7c7aeaaa44f712480a8e7608d3534d3485%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=nHX51ahp3SyGKouKb2WFtYmMQSjSNVzH%2fzvN%2fNPJzPw%3d>
>
>
>
> Has some detailed explanations of what blows up if they do.
>
>
>
> Note that if you want to preserve any of the data on the node, you need to
> :
>
>
>
> 1) bring it and have it join the ring in its new rack (during which time
> it will serve incorrect reads due to missing data)
>
> 2) stop it
>
> 3) run cleanup
>
> 4) run repair
>
> 5) start it again
>
>
>
> Can't really say that I recommend this practice, but it's better than
> "rebootstrap it" which is the official advice. If you "rebootstrap it" you
> decrease unique replica count by 1, which has a nonzero chance of
> data-loss. The Coli Conjecture says that in practice you probably don't
> care about this nonzero chance of data loss if you are running your
> application in CL.ONE, which should be all cases where it matters.
>
>
>
> =Rob
>
>
>

Re: Rack aware question.

Reply via email to