Re: Bootstrap Timing

Steven A Robenalt Fri, 25 Apr 2014 08:39:03 -0700

Interesting. I did our 2.0.3 -> 2.0.5 upgrade by bootstrapping/joining each
node into our cluster, one at a time, then retiring the old nodes one at a
time. Maybe something specific to the 2.0.6 release?


Good to hear that you've gotten through it anyway.

Steve



On Fri, Apr 25, 2014 at 7:49 AM, Phil Burress <philburress...@gmail.com>wrote:

> Cassandra 2.0.6
>
>
> On Fri, Apr 25, 2014 at 10:31 AM, James Rothering <jrother...@codojo.me>wrote:
>
>> What version of C* is this?
>>
>>
>> On Fri, Apr 25, 2014 at 6:55 AM, Phil Burress 
>> <philburress...@gmail.com>wrote:
>>
>>> Just a follow-up on this for any interested parties. Ultimately we've
>>> determined that the bootstrap/join process is broken in Cassandra. We ended
>>> up creating an entirely new cluster and migrating the data.
>>>
>>>
>>> On Mon, Apr 21, 2014 at 10:32 AM, Phil Burress <philburress...@gmail.com
>>> > wrote:
>>>
>>>> The new node has managed to stay up without dying for about 24 hours
>>>> now... but it still is in JOINING state. A new concern has popped up. Disk
>>>> usage is at 500GB on the new node. The three original nodes have about 40GB
>>>> each. Any ideas why this is happening?
>>>>
>>>>
>>>> On Sat, Apr 19, 2014 at 9:19 PM, Phil Burress <philburress...@gmail.com
>>>> > wrote:
>>>>
>>>>> Thank you all for your advice and good info. The node has died a
>>>>> couple of times with out of memory errors. I've restarted each time but it
>>>>> starts re - running compaction and then dies again.
>>>>>
>>>>> Is there a better way to do this?
>>>>> On Apr 18, 2014 6:06 PM, "Steven A Robenalt" <srobe...@stanford.edu>
>>>>> wrote:
>>>>>
>>>>>> That's what I'd be doing, but I wouldn't expect it to run for 3 days
>>>>>> this time. My guess is that whatever was going wrong with the bootstrap
>>>>>> when you had 3 nodes starting at once was interfering with the completion
>>>>>> of the 1 remaining node of those 3. A clean bootstrap of a single node
>>>>>> should complete eventually, and I would think it'll be a lot less than 3
>>>>>> days. Our database is much smaller than yours at the moment, so I can't
>>>>>> really guide you on how long it should take, but I'd think that others on
>>>>>> the list with similar database sizes might be able to give you a better
>>>>>> idea.
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 18, 2014 at 1:43 PM, Phil Burress <
>>>>>> philburress...@gmail.com> wrote:
>>>>>>
>>>>>>> First, I just stopped 2 of the nodes and left one running. But this
>>>>>>> morning, I stopped that third node, cleared out the data, restarted and 
>>>>>>> let
>>>>>>> it rejoin again. It appears streaming is done (according to netstats),
>>>>>>> right now it appears to be running compaction and building secondary 
>>>>>>> index
>>>>>>> (according to compactionstats). Just sit and wait I guess?
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Apr 18, 2014 at 2:23 PM, Steven A Robenalt <
>>>>>>> srobe...@stanford.edu> wrote:
>>>>>>>
>>>>>>>> Looking back through this email chain, it looks like Phil said he
>>>>>>>> wasn't using vnodes.
>>>>>>>>
>>>>>>>> For the record, we are using vnodes since we brought up our first
>>>>>>>> cluster, and have not seen any issues with bootstrapping new nodes 
>>>>>>>> either
>>>>>>>> to replace existing nodes, or to grow/shrink the cluster. We did 
>>>>>>>> adhere to
>>>>>>>> the caveats that new nodes should not be seed nodes, and that we should
>>>>>>>> allow each node to join the cluster completely before making any other
>>>>>>>> changes.
>>>>>>>>
>>>>>>>> Phil, when you dropped to adding just the single node to your
>>>>>>>> cluster, did you start over with the newly added node (blowing away the
>>>>>>>> database created on the previous startup), or did you shut down the 
>>>>>>>> other 2
>>>>>>>> added nodes and leave the remaining one in progress to continue?
>>>>>>>>
>>>>>>>> Steve
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Apr 18, 2014 at 10:40 AM, Robert Coli <rc...@eventbrite.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> On Fri, Apr 18, 2014 at 5:05 AM, Phil Burress <
>>>>>>>>> philburress...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> nodetool netstats shows 84 files. They are all at 100%. Nothing
>>>>>>>>>> showing in Pending or Active for Read Repair Stats.
>>>>>>>>>>
>>>>>>>>>> I'm assuming this means it's done. But it still shows "JOINING".
>>>>>>>>>> Is there an undocumented step I'm missing here? This whole process 
>>>>>>>>>> seems
>>>>>>>>>> broken to me.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Lately it seems like a lot more people than usual are :
>>>>>>>>>
>>>>>>>>> 1) using vnodes
>>>>>>>>> 2) unable to bootstrap new nodes
>>>>>>>>>
>>>>>>>>> If I were you, I would likely file a JIRA detailing your negative
>>>>>>>>> experience with this core functionality.
>>>>>>>>>
>>>>>>>>> =Rob
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Steve Robenalt
>>>>>>>> Software Architect
>>>>>>>>  HighWire | Stanford University
>>>>>>>> 425 Broadway St, Redwood City, CA 94063
>>>>>>>>
>>>>>>>> srobe...@stanford.edu
>>>>>>>> http://highwire.stanford.edu
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Steve Robenalt
>>>>>> Software Architect
>>>>>> HighWire | Stanford University
>>>>>> 425 Broadway St, Redwood City, CA 94063
>>>>>>
>>>>>> srobe...@stanford.edu
>>>>>> http://highwire.stanford.edu
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>


-- 
Steve Robenalt
Software Architect
HighWire | Stanford University
425 Broadway St, Redwood City, CA 94063

srobe...@stanford.edu
http://highwire.stanford.edu

Re: Bootstrap Timing

Reply via email to