Re: Bootstrap Timing

Phil Burress Fri, 25 Apr 2014 06:56:42 -0700

Just a follow-up on this for any interested parties. Ultimately we've
determined that the bootstrap/join process is broken in Cassandra. We ended
up creating an entirely new cluster and migrating the data.



On Mon, Apr 21, 2014 at 10:32 AM, Phil Burress <philburress...@gmail.com>wrote:

> The new node has managed to stay up without dying for about 24 hours
> now... but it still is in JOINING state. A new concern has popped up. Disk
> usage is at 500GB on the new node. The three original nodes have about 40GB
> each. Any ideas why this is happening?
>
>
> On Sat, Apr 19, 2014 at 9:19 PM, Phil Burress <philburress...@gmail.com>wrote:
>
>> Thank you all for your advice and good info. The node has died a couple
>> of times with out of memory errors. I've restarted each time but it starts
>> re - running compaction and then dies again.
>>
>> Is there a better way to do this?
>> On Apr 18, 2014 6:06 PM, "Steven A Robenalt" <srobe...@stanford.edu>
>> wrote:
>>
>>> That's what I'd be doing, but I wouldn't expect it to run for 3 days
>>> this time. My guess is that whatever was going wrong with the bootstrap
>>> when you had 3 nodes starting at once was interfering with the completion
>>> of the 1 remaining node of those 3. A clean bootstrap of a single node
>>> should complete eventually, and I would think it'll be a lot less than 3
>>> days. Our database is much smaller than yours at the moment, so I can't
>>> really guide you on how long it should take, but I'd think that others on
>>> the list with similar database sizes might be able to give you a better
>>> idea.
>>>
>>> Steve
>>>
>>>
>>>
>>> On Fri, Apr 18, 2014 at 1:43 PM, Phil Burress 
>>> <philburress...@gmail.com>wrote:
>>>
>>>> First, I just stopped 2 of the nodes and left one running. But this
>>>> morning, I stopped that third node, cleared out the data, restarted and let
>>>> it rejoin again. It appears streaming is done (according to netstats),
>>>> right now it appears to be running compaction and building secondary index
>>>> (according to compactionstats). Just sit and wait I guess?
>>>>
>>>>
>>>> On Fri, Apr 18, 2014 at 2:23 PM, Steven A Robenalt <
>>>> srobe...@stanford.edu> wrote:
>>>>
>>>>> Looking back through this email chain, it looks like Phil said he
>>>>> wasn't using vnodes.
>>>>>
>>>>> For the record, we are using vnodes since we brought up our first
>>>>> cluster, and have not seen any issues with bootstrapping new nodes either
>>>>> to replace existing nodes, or to grow/shrink the cluster. We did adhere to
>>>>> the caveats that new nodes should not be seed nodes, and that we should
>>>>> allow each node to join the cluster completely before making any other
>>>>> changes.
>>>>>
>>>>> Phil, when you dropped to adding just the single node to your cluster,
>>>>> did you start over with the newly added node (blowing away the database
>>>>> created on the previous startup), or did you shut down the other 2 added
>>>>> nodes and leave the remaining one in progress to continue?
>>>>>
>>>>> Steve
>>>>>
>>>>>
>>>>> On Fri, Apr 18, 2014 at 10:40 AM, Robert Coli <rc...@eventbrite.com>wrote:
>>>>>
>>>>>> On Fri, Apr 18, 2014 at 5:05 AM, Phil Burress <
>>>>>> philburress...@gmail.com> wrote:
>>>>>>
>>>>>>> nodetool netstats shows 84 files. They are all at 100%. Nothing
>>>>>>> showing in Pending or Active for Read Repair Stats.
>>>>>>>
>>>>>>> I'm assuming this means it's done. But it still shows "JOINING". Is
>>>>>>> there an undocumented step I'm missing here? This whole process seems
>>>>>>> broken to me.
>>>>>>>
>>>>>>
>>>>>> Lately it seems like a lot more people than usual are :
>>>>>>
>>>>>> 1) using vnodes
>>>>>> 2) unable to bootstrap new nodes
>>>>>>
>>>>>> If I were you, I would likely file a JIRA detailing your negative
>>>>>> experience with this core functionality.
>>>>>>
>>>>>> =Rob
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Steve Robenalt
>>>>> Software Architect
>>>>>  HighWire | Stanford University
>>>>> 425 Broadway St, Redwood City, CA 94063
>>>>>
>>>>> srobe...@stanford.edu
>>>>> http://highwire.stanford.edu
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Steve Robenalt
>>> Software Architect
>>> HighWire | Stanford University
>>> 425 Broadway St, Redwood City, CA 94063
>>>
>>> srobe...@stanford.edu
>>> http://highwire.stanford.edu
>>>
>>>
>>>
>>>
>>>
>>>
>

Re: Bootstrap Timing

Reply via email to