What I would expect to have happen is for the removed node to disappear from the ring and for nodes that are supposed to get more data to start streaming it over. I would expect it to be hours before any new data started appearing anywhere when you are anticompacting 80+GB prior to the streaming part. http://wiki.apache.org/cassandra/Streaming
On Tue, Jun 22, 2010 at 12:57 AM, Joost Ouwerkerk <jo...@openplaces.org> wrote: > Yes, although "forget" implies that we once knew we were supposed to do so. > Given the following before-and-after states, on which nodes are we supposed > to run repair? Should the cluster be restarted? Is there anything else we > should be doing, or not doing? > > 1. Node is down due to hardware failure > > 192.168.1.104 Up 111.75 GB > 8954799129498380617457226511362321354 | ^ > 192.168.1.106 Up 113.25 GB > 17909598258996761234914453022724642708 v | > 192.168.1.107 Up 75.65 GB > 22386997823745951543643066278405803385 | ^ > 192.168.1.108 Down 75.77 GB > 26864397388495141852371679534086964062 v | > 192.168.1.109 Up 76.14 GB > 35819196517993522469828906045449285416 | ^ > 192.168.1.110 Up 75.9 GB > 40296596082742712778557519301130446093 v | > 192.168.1.111 Up 95.21 GB > 49251395212241093396014745812492767447 | ^ > > 2. nodetool removetoken 26864397388495141852371679534086964062 > > 192.168.1.104 Up 111.75 GB > 8954799129498380617457226511362321354 | ^ > 192.168.1.106 Up 113.25 GB > 17909598258996761234914453022724642708 v | > 192.168.1.107 Up 75.65 GB > 22386997823745951543643066278405803385 | ^ > 192.168.1.109 Up 76.14 GB > 35819196517993522469828906045449285416 | ^ > 192.168.1.110 Up 75.9 GB > 40296596082742712778557519301130446093 v | > 192.168.1.111 Up 95.21 GB > 49251395212241093396014745812492767447 | ^ > > At this point we're expecting 192.168.1.107 to pick up the slack for the > removed token, and for 192.168.1.109 and/or 192.168.1.110 to start streaming > data to 192.168.1.107 since they are holding the replicated data for that > range. > > 3. nodetool repair ? > > On Tue, Jun 22, 2010 at 12:03 AM, Benjamin Black <b...@b3k.us> wrote: >> >> Did you forget to run repair? >> >> On Mon, Jun 21, 2010 at 7:02 PM, Joost Ouwerkerk <jo...@openplaces.org> >> wrote: >> > I believe we did nodetool removetoken on nodes that were already down >> > (due >> > to hardware failure), but I will check to make sure. We're running >> > Cassandra >> > 0.6.2. >> > >> > On Mon, Jun 21, 2010 at 9:59 PM, Joost Ouwerkerk <jo...@openplaces.org> >> > wrote: >> >> >> >> Greg, can you describe the steps we took to decommission the nodes? >> >> >> >> ---------- Forwarded message ---------- >> >> From: Rob Coli <rc...@digg.com> >> >> Date: Mon, Jun 21, 2010 at 8:08 PM >> >> Subject: Re: get_range_slices confused about token ranges after >> >> decommissioning a node >> >> To: user@cassandra.apache.org >> >> >> >> >> >> On 6/21/10 4:57 PM, Joost Ouwerkerk wrote: >> >>> >> >>> We're seeing very strange behaviour after decommissioning a node: when >> >>> requesting a get_range_slices with a KeyRange by token, we are getting >> >>> back tokens that are out of range. >> >> >> >> What sequence of actions did you take to "decommission" the node? What >> >> version of Cassandra are you running? >> >> >> >> =Rob >> >> >> > >> > > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com