I don't mind missing data for a few hours, it's the weird behaviour of get_range_slices that's bothering me. I added some logging to ColumnFamilyRecordReader to see what's going on:
Split startToken=67160993471237854630929198835217410155, endToken=68643623863384825230116928934887817211 ... Getting batch for range: 67965855060996012099315582648654139032 to 68643623863384825230116928934887817211 Token for last row is: 50448492574454416067449808504057295946 Getting batch for range: 50448492574454416067449808504057295946 to 68643623863384825230116928934887817211 ... Notice how the get_range_slices response is invalid since it returns an out-of-range row. This poisons the batching loop and causes the task to spin out of control. /joost On Tue, Jun 22, 2010 at 9:09 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > What I would expect to have happen is for the removed node to > disappear from the ring and for nodes that are supposed to get more > data to start streaming it over. I would expect it to be hours before > any new data started appearing anywhere when you are anticompacting > 80+GB prior to the streaming part. > http://wiki.apache.org/cassandra/Streaming > > On Tue, Jun 22, 2010 at 12:57 AM, Joost Ouwerkerk <jo...@openplaces.org> > wrote: > > Yes, although "forget" implies that we once knew we were supposed to do > so. > > Given the following before-and-after states, on which nodes are we > supposed > > to run repair? Should the cluster be restarted? Is there anything else > we > > should be doing, or not doing? > > > > 1. Node is down due to hardware failure > > > > 192.168.1.104 Up 111.75 GB > > 8954799129498380617457226511362321354 | ^ > > 192.168.1.106 Up 113.25 GB > > 17909598258996761234914453022724642708 v | > > 192.168.1.107 Up 75.65 GB > > 22386997823745951543643066278405803385 | ^ > > 192.168.1.108 Down 75.77 GB > > 26864397388495141852371679534086964062 v | > > 192.168.1.109 Up 76.14 GB > > 35819196517993522469828906045449285416 | ^ > > 192.168.1.110 Up 75.9 GB > > 40296596082742712778557519301130446093 v | > > 192.168.1.111 Up 95.21 GB > > 49251395212241093396014745812492767447 | ^ > > > > 2. nodetool removetoken 26864397388495141852371679534086964062 > > > > 192.168.1.104 Up 111.75 GB > > 8954799129498380617457226511362321354 | ^ > > 192.168.1.106 Up 113.25 GB > > 17909598258996761234914453022724642708 v | > > 192.168.1.107 Up 75.65 GB > > 22386997823745951543643066278405803385 | ^ > > 192.168.1.109 Up 76.14 GB > > 35819196517993522469828906045449285416 | ^ > > 192.168.1.110 Up 75.9 GB > > 40296596082742712778557519301130446093 v | > > 192.168.1.111 Up 95.21 GB > > 49251395212241093396014745812492767447 | ^ > > > > At this point we're expecting 192.168.1.107 to pick up the slack for the > > removed token, and for 192.168.1.109 and/or 192.168.1.110 to start > streaming > > data to 192.168.1.107 since they are holding the replicated data for that > > range. > > > > 3. nodetool repair ? > > > > On Tue, Jun 22, 2010 at 12:03 AM, Benjamin Black <b...@b3k.us> wrote: > >> > >> Did you forget to run repair? > >> > >> On Mon, Jun 21, 2010 at 7:02 PM, Joost Ouwerkerk <jo...@openplaces.org> > >> wrote: > >> > I believe we did nodetool removetoken on nodes that were already down > >> > (due > >> > to hardware failure), but I will check to make sure. We're running > >> > Cassandra > >> > 0.6.2. > >> > > >> > On Mon, Jun 21, 2010 at 9:59 PM, Joost Ouwerkerk < > jo...@openplaces.org> > >> > wrote: > >> >> > >> >> Greg, can you describe the steps we took to decommission the nodes? > >> >> > >> >> ---------- Forwarded message ---------- > >> >> From: Rob Coli <rc...@digg.com> > >> >> Date: Mon, Jun 21, 2010 at 8:08 PM > >> >> Subject: Re: get_range_slices confused about token ranges after > >> >> decommissioning a node > >> >> To: user@cassandra.apache.org > >> >> > >> >> > >> >> On 6/21/10 4:57 PM, Joost Ouwerkerk wrote: > >> >>> > >> >>> We're seeing very strange behaviour after decommissioning a node: > when > >> >>> requesting a get_range_slices with a KeyRange by token, we are > getting > >> >>> back tokens that are out of range. > >> >> > >> >> What sequence of actions did you take to "decommission" the node? > What > >> >> version of Cassandra are you running? > >> >> > >> >> =Rob > >> >> > >> > > >> > > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >