There is a bug where a node without schema can not bootstrap. Do you have schema?
On Tue, Feb 18, 2014 at 1:29 PM, Arindam Barua <aba...@247-inc.com> wrote: > > > The node is still out of the ring. Any suggestions on how to get it in > will be very helpful. > > > > *From:* Arindam Barua [mailto:aba...@247-inc.com] > *Sent:* Friday, February 14, 2014 1:04 AM > *To:* user@cassandra.apache.org > *Subject:* Bootstrap stuck: vnode enabled 1.2.12 > > > > > > After our otherwise successful upgrade procedure to enable vnodes, when > adding back "new" hosts to our cluster, one non-seed host ran into a > hardware issue during bootstrap. By the time the hardware issue was fixed a > week later, all other nodes were added successfully, cleaned, repaired. The > disks on this node were untouched, and when the node was started back up, > it detected an interrupted bootstrap, and attempted to bootstrap. However, > after ~24 hrs it was still stuck in the 'JOINING' state according to > nodetool netstats on that node, even though no streams were flowing to/from > it. Also, it did not appear in nodetool status in any way/form (not even as > JOINING). > > > > From couple of observed thread dumps, the stack of the thread blocked > during bootstrap is at [1]. > > > > Since the node wasn't making any progress, I ended up stopping Cassandra, > cleaning up the data and commitlog directories, and attempted a fresh > bootstrap. Nodetool netstats immediately reported a whole bunch of streams > queued up, and data started streaming to the node. The data directory > quickly grew to 18 GB (the other nodes had ~25GB, but we have lot of data > with low TTLs). However, the node ended up being in the earlier reported > state, i.e. nodetool netstats doesn't have anything queued, but still > reports the JOINING state, even though it's been > 24 hrs. There are no > other ERRORS in the logs, and new data being written to the cluster makes > it to this node just fine, triggering compactions, etc from time to time. > > > > Any help is appreciated. > > > > Thanks, > > Arindam > > [1] Thread dump > Thread 3708: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information > may > be imprecise) > - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, > line=156 (Interpreted frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() > @bci=1, line=811 (Interpreted frame) > - > > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(int) > @bci=55, line=969 (Interpreted frame) > - > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(int) > @bci=24, line=1281 (Interpreted frame) > - java.util.concurrent.CountDownLatch.await() @bci=5, line=207 > (Interpreted > frame) > - org.apache.cassandra.dht.RangeStreamer.fetch() @bci=209, line=256 > (Interpreted frame) > - org.apache.cassandra.dht.BootStrapper.bootstrap() @bci=120, line=84 > (Interpreted frame) > - > org.apache.cassandra.service.StorageService.bootstrap(java.util.Collection) > @bci=172, line=978 (Interpreted frame) > - org.apache.cassandra.service.StorageService.joinTokenRing(int) @bci=827, > line=744 (Interpreted frame) > - org.apache.cassandra.service.StorageService.initServer(int) @bci=363, > line=585 (Interpreted frame) > - org.apache.cassandra.service.StorageService.initServer() @bci=4, > line=482 > (Interpreted frame) > - org.apache.cassandra.service.CassandraDaemon.setup() @bci=1069, line=348 > (Interpreted frame) > - org.apache.cassandra.service.CassandraDaemon.activate() @bci=59, > line=447 > (Interpreted frame) > - org.apache.cassandra.service.CassandraDaemon.main(java.lang.String[]) > @bci=3, > line=490 (Interpreted frame) >