I see this in the old nodes

DEBUG [WRITE-/10.220.198.15] 2010-07-20 21:15:50,366 OutboundTcpConnection.java 
(line 142) attempting to connect to /10.220.198.15
INFO [GMFD:1] 2010-07-20 21:15:50,391 Gossiper.java (line 586) Node 
/10.220.198.15 is now part of the cluster
INFO [GMFD:1] 2010-07-20 21:15:51,369 Gossiper.java (line 578) InetAddress 
/10.220.198.15 is now UP
INFO [HINTED-HANDOFF-POOL:1] 2010-07-20 21:15:51,369 HintedHandOffManager.java 
(line 153) Started hinted handoff for endPoint /10.220.198.15
INFO [HINTED-HANDOFF-POOL:1] 2010-07-20 21:15:51,371 HintedHandOffManager.java 
(line 210) Finished hinted handoff of 0 rows to endpoint /10.220.198.15
DEBUG [GMFD:1] 2010-07-20 21:17:20,551 StorageService.java (line 512) Node
/10.220.198.15 state bootstrapping, token 28356863910078205288614550619314017621
DEBUG [GMFD:1] 2010-07-20 21:17:20,656
StorageService.java (line 746) Pending ranges:
/10.220.198.15:(21604748163853165203168832909938143241,28356863910078205288614550619314017621]
/10.220.198.15:(10637639655367601517656788464652024082,21604748163853165203168832909938143241]

10.220.198.15 is the new node

The key ranges seem to be for the primary and replica ranges.

So after that, I would expect some AntiCompaction to happen on some of the 
other nodes, but I don't see anything.

Any clues from that output?

I did not muck around with the Location tables.

-Anthony

On Mon, Jul 19, 2010 at 09:36:22PM -0500, Jonathan Ellis wrote:
> What gets logged on the old nodes at debug, when you try to add a
> single new machine after a full cluster restart?
> 
> Removing Location would blow away the nodes' token information...  It
> should be safe if you set the InitialToken to what it used to be on
> each machine before bringing it up after nuking those.  Better
> snapshot the system keyspace first, just in case.
> 
> On Sun, Jul 18, 2010 at 2:01 PM, Anthony Molinaro
> <antho...@alumni.caltech.edu> wrote:
> > Yeah, I tried all that already and it didn't seem to work, no new nodes
> > will bootstrap, which makes me think there's some saved state somewhere,
> > preventing a new node from bootstrapping.  I think maybe the Location
> > sstables?  Is it safe to nuke those on all hosts and restart everything?
> > (I just don't want to lose actual data).
> >
> > Thanks for the ideas,
> >
> > -Anthony
> >
> > On Sun, Jul 18, 2010 at 08:09:45PM +0300, shimi wrote:
> >> If I have problems with never ending bootstraping I do the following. I try
> >> each one if it doesn't help I try the next. It might not be the right thing
> >> to do but it worked for me.
> >>
> >> 1. Restart the bootstraping node
> >> 2. If I see streaming 0/xxxx I restart the node and all the streaming nodes
> >> 3. Restart all the nodes
> >> 4. If there is data in the bootstraing node I delete it before I restart.
> >>
> >> Good luck
> >> Shimi
> >>
> >> On Sun, Jul 18, 2010 at 12:21 AM, Anthony Molinaro <
> >> antho...@alumni.caltech.edu> wrote:
> >>
> >> > So still waiting for any sort of answer on this one.  The cluster still
> >> > refuses to do anything when I bring up new nodes.  I shut down all the
> >> > new nodes and am waiting.  I'm guessing that maybe the old nodes have
> >> > some state which needs to get cleared out?  Is there anything I can do
> >> > at this point?  Are there alternate strategies for bootstrapping I can
> >> > try?  (For instance can I just scp all the sstables to all the new
> >> > nodes and do a repair, would that actually work?).
> >> >
> >> > Anyone seen this sort of issue?  All this is with 0.6.3 so I assume
> >> > eventually others will see this issue.
> >> >
> >> > -Anthony
> >> >
> >> > On Thu, Jul 15, 2010 at 10:45:08PM -0700, Anthony Molinaro wrote:
> >> > > Okay, so things were pretty messed up.  I shut down all the new nodes,
> >> > > then the old nodes started doing the half the ring is down garbage 
> >> > > which
> >> > > pretty much requires a full restart of everything.  So I had to shut
> >> > > everything down, then bring the seed back, then the rest of the nodes,
> >> > > so they finally all agreed on the ring again.
> >> > >
> >> > > Then I started one of the new nodes, and have been watching the logs, 
> >> > > so
> >> > > far 2 hours since the "Bootstrapping" message appeared in the new
> >> > > log and nothing has happened.  No anticompaction messages anywhere,
> >> > there's
> >> > > one node compacting, but its on the other end of the ring, so no where
> >> > near
> >> > > that new node.  I'm wondering if it will ever get data at this point.
> >> > >
> >> > > Is there something else I should try?  The only thing I can think of
> >> > > is deleting the system directory on the new node, and restarting, so
> >> > > I'll try that and see if it does anything.
> >> > >
> >> > > -Anthony
> >> > >
> >> > > On Thu, Jul 15, 2010 at 03:43:49PM -0500, Jonathan Ellis wrote:
> >> > > > On Thu, Jul 15, 2010 at 3:28 PM, Anthony Molinaro
> >> > > > <antho...@alumni.caltech.edu> wrote:
> >> > > > > Is the fact that 2 new nodes are in the range messing it up?
> >> > > >
> >> > > > Probably.
> >> > > >
> >> > > > >  And if so
> >> > > > > how do I recover (I'm thinking, shutdown new nodes 2,3,4,5, the
> >> > bringing
> >> > > > > up nodes 2,4, waiting for them to finish, then bringing up 3,5?).
> >> > > >
> >> > > > Yes.
> >> > > >
> >> > > > You might have to restart the old nodes too to clear out the 
> >> > > > confusion.
> >> > > >
> >> > > > --
> >> > > > Jonathan Ellis
> >> > > > Project Chair, Apache Cassandra
> >> > > > co-founder of Riptano, the source for professional Cassandra support
> >> > > > http://riptano.com
> >> > >
> >> > > --
> >> > > ------------------------------------------------------------------------
> >> > > Anthony Molinaro                           
> >> > > <antho...@alumni.caltech.edu>
> >> >
> >> > --
> >> > ------------------------------------------------------------------------
> >> > Anthony Molinaro                           <antho...@alumni.caltech.edu>
> >> >
> >
> > --
> > ------------------------------------------------------------------------
> > Anthony Molinaro                           <antho...@alumni.caltech.edu>
> >
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <antho...@alumni.caltech.edu>

Reply via email to