Just a follow-up on this for any interested parties. Ultimately we've determined that the bootstrap/join process is broken in Cassandra. We ended up creating an entirely new cluster and migrating the data.
On Mon, Apr 21, 2014 at 10:32 AM, Phil Burress <philburress...@gmail.com>wrote: > The new node has managed to stay up without dying for about 24 hours > now... but it still is in JOINING state. A new concern has popped up. Disk > usage is at 500GB on the new node. The three original nodes have about 40GB > each. Any ideas why this is happening? > > > On Sat, Apr 19, 2014 at 9:19 PM, Phil Burress <philburress...@gmail.com>wrote: > >> Thank you all for your advice and good info. The node has died a couple >> of times with out of memory errors. I've restarted each time but it starts >> re - running compaction and then dies again. >> >> Is there a better way to do this? >> On Apr 18, 2014 6:06 PM, "Steven A Robenalt" <srobe...@stanford.edu> >> wrote: >> >>> That's what I'd be doing, but I wouldn't expect it to run for 3 days >>> this time. My guess is that whatever was going wrong with the bootstrap >>> when you had 3 nodes starting at once was interfering with the completion >>> of the 1 remaining node of those 3. A clean bootstrap of a single node >>> should complete eventually, and I would think it'll be a lot less than 3 >>> days. Our database is much smaller than yours at the moment, so I can't >>> really guide you on how long it should take, but I'd think that others on >>> the list with similar database sizes might be able to give you a better >>> idea. >>> >>> Steve >>> >>> >>> >>> On Fri, Apr 18, 2014 at 1:43 PM, Phil Burress >>> <philburress...@gmail.com>wrote: >>> >>>> First, I just stopped 2 of the nodes and left one running. But this >>>> morning, I stopped that third node, cleared out the data, restarted and let >>>> it rejoin again. It appears streaming is done (according to netstats), >>>> right now it appears to be running compaction and building secondary index >>>> (according to compactionstats). Just sit and wait I guess? >>>> >>>> >>>> On Fri, Apr 18, 2014 at 2:23 PM, Steven A Robenalt < >>>> srobe...@stanford.edu> wrote: >>>> >>>>> Looking back through this email chain, it looks like Phil said he >>>>> wasn't using vnodes. >>>>> >>>>> For the record, we are using vnodes since we brought up our first >>>>> cluster, and have not seen any issues with bootstrapping new nodes either >>>>> to replace existing nodes, or to grow/shrink the cluster. We did adhere to >>>>> the caveats that new nodes should not be seed nodes, and that we should >>>>> allow each node to join the cluster completely before making any other >>>>> changes. >>>>> >>>>> Phil, when you dropped to adding just the single node to your cluster, >>>>> did you start over with the newly added node (blowing away the database >>>>> created on the previous startup), or did you shut down the other 2 added >>>>> nodes and leave the remaining one in progress to continue? >>>>> >>>>> Steve >>>>> >>>>> >>>>> On Fri, Apr 18, 2014 at 10:40 AM, Robert Coli <rc...@eventbrite.com>wrote: >>>>> >>>>>> On Fri, Apr 18, 2014 at 5:05 AM, Phil Burress < >>>>>> philburress...@gmail.com> wrote: >>>>>> >>>>>>> nodetool netstats shows 84 files. They are all at 100%. Nothing >>>>>>> showing in Pending or Active for Read Repair Stats. >>>>>>> >>>>>>> I'm assuming this means it's done. But it still shows "JOINING". Is >>>>>>> there an undocumented step I'm missing here? This whole process seems >>>>>>> broken to me. >>>>>>> >>>>>> >>>>>> Lately it seems like a lot more people than usual are : >>>>>> >>>>>> 1) using vnodes >>>>>> 2) unable to bootstrap new nodes >>>>>> >>>>>> If I were you, I would likely file a JIRA detailing your negative >>>>>> experience with this core functionality. >>>>>> >>>>>> =Rob >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Steve Robenalt >>>>> Software Architect >>>>> HighWire | Stanford University >>>>> 425 Broadway St, Redwood City, CA 94063 >>>>> >>>>> srobe...@stanford.edu >>>>> http://highwire.stanford.edu >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Steve Robenalt >>> Software Architect >>> HighWire | Stanford University >>> 425 Broadway St, Redwood City, CA 94063 >>> >>> srobe...@stanford.edu >>> http://highwire.stanford.edu >>> >>> >>> >>> >>> >>> >