Interesting. I did our 2.0.3 -> 2.0.5 upgrade by bootstrapping/joining each node into our cluster, one at a time, then retiring the old nodes one at a time. Maybe something specific to the 2.0.6 release?
Good to hear that you've gotten through it anyway. Steve On Fri, Apr 25, 2014 at 7:49 AM, Phil Burress <philburress...@gmail.com>wrote: > Cassandra 2.0.6 > > > On Fri, Apr 25, 2014 at 10:31 AM, James Rothering <jrother...@codojo.me>wrote: > >> What version of C* is this? >> >> >> On Fri, Apr 25, 2014 at 6:55 AM, Phil Burress >> <philburress...@gmail.com>wrote: >> >>> Just a follow-up on this for any interested parties. Ultimately we've >>> determined that the bootstrap/join process is broken in Cassandra. We ended >>> up creating an entirely new cluster and migrating the data. >>> >>> >>> On Mon, Apr 21, 2014 at 10:32 AM, Phil Burress <philburress...@gmail.com >>> > wrote: >>> >>>> The new node has managed to stay up without dying for about 24 hours >>>> now... but it still is in JOINING state. A new concern has popped up. Disk >>>> usage is at 500GB on the new node. The three original nodes have about 40GB >>>> each. Any ideas why this is happening? >>>> >>>> >>>> On Sat, Apr 19, 2014 at 9:19 PM, Phil Burress <philburress...@gmail.com >>>> > wrote: >>>> >>>>> Thank you all for your advice and good info. The node has died a >>>>> couple of times with out of memory errors. I've restarted each time but it >>>>> starts re - running compaction and then dies again. >>>>> >>>>> Is there a better way to do this? >>>>> On Apr 18, 2014 6:06 PM, "Steven A Robenalt" <srobe...@stanford.edu> >>>>> wrote: >>>>> >>>>>> That's what I'd be doing, but I wouldn't expect it to run for 3 days >>>>>> this time. My guess is that whatever was going wrong with the bootstrap >>>>>> when you had 3 nodes starting at once was interfering with the completion >>>>>> of the 1 remaining node of those 3. A clean bootstrap of a single node >>>>>> should complete eventually, and I would think it'll be a lot less than 3 >>>>>> days. Our database is much smaller than yours at the moment, so I can't >>>>>> really guide you on how long it should take, but I'd think that others on >>>>>> the list with similar database sizes might be able to give you a better >>>>>> idea. >>>>>> >>>>>> Steve >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Apr 18, 2014 at 1:43 PM, Phil Burress < >>>>>> philburress...@gmail.com> wrote: >>>>>> >>>>>>> First, I just stopped 2 of the nodes and left one running. But this >>>>>>> morning, I stopped that third node, cleared out the data, restarted and >>>>>>> let >>>>>>> it rejoin again. It appears streaming is done (according to netstats), >>>>>>> right now it appears to be running compaction and building secondary >>>>>>> index >>>>>>> (according to compactionstats). Just sit and wait I guess? >>>>>>> >>>>>>> >>>>>>> On Fri, Apr 18, 2014 at 2:23 PM, Steven A Robenalt < >>>>>>> srobe...@stanford.edu> wrote: >>>>>>> >>>>>>>> Looking back through this email chain, it looks like Phil said he >>>>>>>> wasn't using vnodes. >>>>>>>> >>>>>>>> For the record, we are using vnodes since we brought up our first >>>>>>>> cluster, and have not seen any issues with bootstrapping new nodes >>>>>>>> either >>>>>>>> to replace existing nodes, or to grow/shrink the cluster. We did >>>>>>>> adhere to >>>>>>>> the caveats that new nodes should not be seed nodes, and that we should >>>>>>>> allow each node to join the cluster completely before making any other >>>>>>>> changes. >>>>>>>> >>>>>>>> Phil, when you dropped to adding just the single node to your >>>>>>>> cluster, did you start over with the newly added node (blowing away the >>>>>>>> database created on the previous startup), or did you shut down the >>>>>>>> other 2 >>>>>>>> added nodes and leave the remaining one in progress to continue? >>>>>>>> >>>>>>>> Steve >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Apr 18, 2014 at 10:40 AM, Robert Coli <rc...@eventbrite.com >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> On Fri, Apr 18, 2014 at 5:05 AM, Phil Burress < >>>>>>>>> philburress...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> nodetool netstats shows 84 files. They are all at 100%. Nothing >>>>>>>>>> showing in Pending or Active for Read Repair Stats. >>>>>>>>>> >>>>>>>>>> I'm assuming this means it's done. But it still shows "JOINING". >>>>>>>>>> Is there an undocumented step I'm missing here? This whole process >>>>>>>>>> seems >>>>>>>>>> broken to me. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Lately it seems like a lot more people than usual are : >>>>>>>>> >>>>>>>>> 1) using vnodes >>>>>>>>> 2) unable to bootstrap new nodes >>>>>>>>> >>>>>>>>> If I were you, I would likely file a JIRA detailing your negative >>>>>>>>> experience with this core functionality. >>>>>>>>> >>>>>>>>> =Rob >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Steve Robenalt >>>>>>>> Software Architect >>>>>>>> HighWire | Stanford University >>>>>>>> 425 Broadway St, Redwood City, CA 94063 >>>>>>>> >>>>>>>> srobe...@stanford.edu >>>>>>>> http://highwire.stanford.edu >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Steve Robenalt >>>>>> Software Architect >>>>>> HighWire | Stanford University >>>>>> 425 Broadway St, Redwood City, CA 94063 >>>>>> >>>>>> srobe...@stanford.edu >>>>>> http://highwire.stanford.edu >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>> >>> >> > -- Steve Robenalt Software Architect HighWire | Stanford University 425 Broadway St, Redwood City, CA 94063 srobe...@stanford.edu http://highwire.stanford.edu