Turns out I needed to shut everything down completely, then start it all up a rolling restart was still resulting in some nodes being confused about what ring they were in.
I think the moral of all this, is any changes to the seed node must result in a full restart of your cluster. Also any use of removetoken is perilous. Good news is I'm off of the old nodes, I'll need to figure out a way to bulk load the data from some of the old sstables, but I think sstable2json and a quick perl script to load might work out. Then after that upgrade to 0.6.x -Anthony On Fri, Apr 23, 2010 at 02:22:11PM -0700, Anthony Molinaro wrote: > > On Fri, Apr 23, 2010 at 01:17:21PM -0500, Jonathan Ellis wrote: > > On Fri, Apr 23, 2010 at 1:12 PM, Anthony Molinaro > > <antho...@alumni.caltech.edu> wrote: > > > I'm not sure how it would get this, maybe I need to restart my seed node? > > > > It's worth a try. Sounds like you found an unusual bug in gossip. > > Damn, restarting the seed, resulted in the seed coming up in a new ring > with 3 nodes which have been decommissioned. Seems like restarting other > nodes brings them into that ring (or at least the first few seem to be in > the new ring). I'll restart them all to see if I can't get to a consistent > ring. You know what might have happened, I changed the ip of the seed host > in my /etc/hosts before starting to decommission, I bet I should have then > restarted everything. Oh well, hopefully most of my data is still viable. > > I do still have all the old sstables lying around, can I just sstable2json > then json2sstable and have it reload them? Or do the sstables need to be > keyed to the keyrange? I guess I can sstable2json then create an import > script to insert them via thrift? > > > > When I run nodeprobe ring on the seed I don't see any of the hosts I > > > decommissioned, but maybe they are still listed there somewhere? > > > > 0.5 does leave decommissioned host information in gossip, but I'm not > > sure how that applies to this problem. > > I bet that was a red herring, I'm pretty convinced now this was all a > result of me now restarting all the nodes after making a change to the > seed. > > -Anthony > > -- > ------------------------------------------------------------------------ > Anthony Molinaro <antho...@alumni.caltech.edu> -- ------------------------------------------------------------------------ Anthony Molinaro <antho...@alumni.caltech.edu>