Turns out I needed to shut everything down completely, then start it all up
a rolling restart was still resulting in some nodes being confused about
what ring they were in.

I think the moral of all this, is any changes to the seed node must result
in a full restart of your cluster.  Also any use of removetoken is perilous.

Good news is I'm off of the old nodes, I'll need to figure out a way to
bulk load the data from some of the old sstables, but I think sstable2json
and a quick perl script to load might work out.

Then after that upgrade to 0.6.x

-Anthony

On Fri, Apr 23, 2010 at 02:22:11PM -0700, Anthony Molinaro wrote:
> 
> On Fri, Apr 23, 2010 at 01:17:21PM -0500, Jonathan Ellis wrote:
> > On Fri, Apr 23, 2010 at 1:12 PM, Anthony Molinaro
> > <antho...@alumni.caltech.edu> wrote:
> > > I'm not sure how it would get this, maybe I need to restart my seed node?
> > 
> > It's worth a try.  Sounds like you found an unusual bug in gossip.
> 
> Damn, restarting the seed, resulted in the seed coming up in a new ring
> with 3 nodes which have been decommissioned.  Seems like restarting other
> nodes brings them into that ring (or at least the first few seem to be in
> the new ring).  I'll restart them all to see if I can't get to a consistent
> ring.  You know what might have happened, I changed the ip of the seed host
> in my /etc/hosts before starting to decommission, I bet I should have then
> restarted everything.  Oh well, hopefully most of my data is still viable.
> 
> I do still have all the old sstables lying around, can I just sstable2json
> then json2sstable and have it reload them?  Or do the sstables need to be
> keyed to the keyrange?  I guess I can sstable2json then create an import
> script to insert them via thrift?
> 
> > > When I run nodeprobe ring on the seed I don't see any of the hosts I
> > > decommissioned, but maybe they are still listed there somewhere?
> > 
> > 0.5 does leave decommissioned host information in gossip, but I'm not
> > sure how that applies to this problem.
> 
> I bet that was a red herring, I'm pretty convinced now this was all a
> result of me now restarting all the nodes after making a change to the
> seed.
> 
> -Anthony
> 
> -- 
> ------------------------------------------------------------------------
> Anthony Molinaro                           <antho...@alumni.caltech.edu>

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <antho...@alumni.caltech.edu>

Reply via email to