I see this in the old nodes DEBUG [WRITE-/10.220.198.15] 2010-07-20 21:15:50,366 OutboundTcpConnection.java (line 142) attempting to connect to /10.220.198.15 INFO [GMFD:1] 2010-07-20 21:15:50,391 Gossiper.java (line 586) Node /10.220.198.15 is now part of the cluster INFO [GMFD:1] 2010-07-20 21:15:51,369 Gossiper.java (line 578) InetAddress /10.220.198.15 is now UP INFO [HINTED-HANDOFF-POOL:1] 2010-07-20 21:15:51,369 HintedHandOffManager.java (line 153) Started hinted handoff for endPoint /10.220.198.15 INFO [HINTED-HANDOFF-POOL:1] 2010-07-20 21:15:51,371 HintedHandOffManager.java (line 210) Finished hinted handoff of 0 rows to endpoint /10.220.198.15 DEBUG [GMFD:1] 2010-07-20 21:17:20,551 StorageService.java (line 512) Node /10.220.198.15 state bootstrapping, token 28356863910078205288614550619314017621 DEBUG [GMFD:1] 2010-07-20 21:17:20,656 StorageService.java (line 746) Pending ranges: /10.220.198.15:(21604748163853165203168832909938143241,28356863910078205288614550619314017621] /10.220.198.15:(10637639655367601517656788464652024082,21604748163853165203168832909938143241]
10.220.198.15 is the new node The key ranges seem to be for the primary and replica ranges. So after that, I would expect some AntiCompaction to happen on some of the other nodes, but I don't see anything. Any clues from that output? I did not muck around with the Location tables. -Anthony On Mon, Jul 19, 2010 at 09:36:22PM -0500, Jonathan Ellis wrote: > What gets logged on the old nodes at debug, when you try to add a > single new machine after a full cluster restart? > > Removing Location would blow away the nodes' token information... It > should be safe if you set the InitialToken to what it used to be on > each machine before bringing it up after nuking those. Better > snapshot the system keyspace first, just in case. > > On Sun, Jul 18, 2010 at 2:01 PM, Anthony Molinaro > <antho...@alumni.caltech.edu> wrote: > > Yeah, I tried all that already and it didn't seem to work, no new nodes > > will bootstrap, which makes me think there's some saved state somewhere, > > preventing a new node from bootstrapping. I think maybe the Location > > sstables? Is it safe to nuke those on all hosts and restart everything? > > (I just don't want to lose actual data). > > > > Thanks for the ideas, > > > > -Anthony > > > > On Sun, Jul 18, 2010 at 08:09:45PM +0300, shimi wrote: > >> If I have problems with never ending bootstraping I do the following. I try > >> each one if it doesn't help I try the next. It might not be the right thing > >> to do but it worked for me. > >> > >> 1. Restart the bootstraping node > >> 2. If I see streaming 0/xxxx I restart the node and all the streaming nodes > >> 3. Restart all the nodes > >> 4. If there is data in the bootstraing node I delete it before I restart. > >> > >> Good luck > >> Shimi > >> > >> On Sun, Jul 18, 2010 at 12:21 AM, Anthony Molinaro < > >> antho...@alumni.caltech.edu> wrote: > >> > >> > So still waiting for any sort of answer on this one. The cluster still > >> > refuses to do anything when I bring up new nodes. I shut down all the > >> > new nodes and am waiting. I'm guessing that maybe the old nodes have > >> > some state which needs to get cleared out? Is there anything I can do > >> > at this point? Are there alternate strategies for bootstrapping I can > >> > try? (For instance can I just scp all the sstables to all the new > >> > nodes and do a repair, would that actually work?). > >> > > >> > Anyone seen this sort of issue? All this is with 0.6.3 so I assume > >> > eventually others will see this issue. > >> > > >> > -Anthony > >> > > >> > On Thu, Jul 15, 2010 at 10:45:08PM -0700, Anthony Molinaro wrote: > >> > > Okay, so things were pretty messed up. I shut down all the new nodes, > >> > > then the old nodes started doing the half the ring is down garbage > >> > > which > >> > > pretty much requires a full restart of everything. So I had to shut > >> > > everything down, then bring the seed back, then the rest of the nodes, > >> > > so they finally all agreed on the ring again. > >> > > > >> > > Then I started one of the new nodes, and have been watching the logs, > >> > > so > >> > > far 2 hours since the "Bootstrapping" message appeared in the new > >> > > log and nothing has happened. No anticompaction messages anywhere, > >> > there's > >> > > one node compacting, but its on the other end of the ring, so no where > >> > near > >> > > that new node. I'm wondering if it will ever get data at this point. > >> > > > >> > > Is there something else I should try? The only thing I can think of > >> > > is deleting the system directory on the new node, and restarting, so > >> > > I'll try that and see if it does anything. > >> > > > >> > > -Anthony > >> > > > >> > > On Thu, Jul 15, 2010 at 03:43:49PM -0500, Jonathan Ellis wrote: > >> > > > On Thu, Jul 15, 2010 at 3:28 PM, Anthony Molinaro > >> > > > <antho...@alumni.caltech.edu> wrote: > >> > > > > Is the fact that 2 new nodes are in the range messing it up? > >> > > > > >> > > > Probably. > >> > > > > >> > > > > And if so > >> > > > > how do I recover (I'm thinking, shutdown new nodes 2,3,4,5, the > >> > bringing > >> > > > > up nodes 2,4, waiting for them to finish, then bringing up 3,5?). > >> > > > > >> > > > Yes. > >> > > > > >> > > > You might have to restart the old nodes too to clear out the > >> > > > confusion. > >> > > > > >> > > > -- > >> > > > Jonathan Ellis > >> > > > Project Chair, Apache Cassandra > >> > > > co-founder of Riptano, the source for professional Cassandra support > >> > > > http://riptano.com > >> > > > >> > > -- > >> > > ------------------------------------------------------------------------ > >> > > Anthony Molinaro > >> > > <antho...@alumni.caltech.edu> > >> > > >> > -- > >> > ------------------------------------------------------------------------ > >> > Anthony Molinaro <antho...@alumni.caltech.edu> > >> > > > > > -- > > ------------------------------------------------------------------------ > > Anthony Molinaro <antho...@alumni.caltech.edu> > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com -- ------------------------------------------------------------------------ Anthony Molinaro <antho...@alumni.caltech.edu>