Re: Recovering from a faulty cassandra node

Jabbar Azam Mon, 25 Mar 2013 03:14:26 -0700

nodetool cleanup took about 23.5 hours on each node(did this in parallel).
started the nodetool cleanup 20:53 on March 22 and it's still running
(10:08 25 March)


The RF = 3. The load on each node is 490 GB, 491 GB, 323GB, 476GB

I think I read some that removenode is faster the more nodes there are in
the cluster.

My next email will be the last in the thread. I thought the info might be
useful to other people in the community.





On 21 March 2013 21:59, Jabbar Azam <aja...@gmail.com> wrote:

> nodetool cleanup command removes keys which can be deleted from the node
> the  command is run. So I'm assuming I can run nodetool cleanup on all the
> old nodes in parallel. Wouldn't do this on a live cluster as it's I/O
> intensive on each node.
>
>
> On 21 March 2013 17:26, Jabbar Azam <aja...@gmail.com> wrote:
>
>> Can I do a multiple node nodetool cleanup on my test cluster?
>> On 21 Mar 2013 17:12, "Jabbar Azam" <aja...@gmail.com> wrote:
>>
>>>
>>> All cassandra-topology.properties are the same.
>>>
>>> The node add appears to be successful. I can see it using nodetool
>>> status. I'm doing a node cleanup on the old nodes and then will do a node
>>> remove, to remove the old node. The actual node join took about 6 hours.
>>> The wiped node(now new node) has about 324 GB of files in /var/lib/cassandra
>>>
>>>
>>>
>>>
>>>
>>> On 21 March 2013 16:58, aaron morton <aa...@thelastpickle.com> wrote:
>>>
>>>>  Not sure if I needed to change cassandra-topology.properties file on
>>>> the existing nodes.
>>>>
>>>> If you are using the PropertyFileSnitch all nodes need to have the same
>>>> cassandra-topology.properties file.
>>>>
>>>> Cheers
>>>>
>>>>    -----------------
>>>> Aaron Morton
>>>> Freelance Cassandra Consultant
>>>> New Zealand
>>>>
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>>
>>>> On 21/03/2013, at 1:34 AM, Jabbar Azam <aja...@gmail.com> wrote:
>>>>
>>>> I've added the node with a different IP address and after disabling the
>>>> firewall data is being streamed from the existing nodes to the wiped node.
>>>> I'll do a cleanup, followed by remove node once it's done.
>>>>
>>>> I've also added the new node to the existing nodes'
>>>> cassandra-topology.properties file and restarted them. I also found I had
>>>> iptables switched on and couldn't understand why the wiped node couldn't
>>>> see the cluster. Not sure if I needed to change
>>>> cassandra-topology.properties file on the existing nodes.
>>>>
>>>>
>>>>
>>>>
>>>> On 19 March 2013 15:49, Jabbar Azam <aja...@gmail.com> wrote:
>>>>
>>>>> Do I use removenode before adding the reinstalled node or after?
>>>>>
>>>>>
>>>>> On 19 March 2013 15:45, Alain RODRIGUEZ <arodr...@gmail.com> wrote:
>>>>>
>>>>>> In 1.2, you may want to use the nodetool removenode if your server i
>>>>>> broken or unreachable, else I guess nodetool decommission remains the 
>>>>>> good
>>>>>> way to remove a node. (
>>>>>> http://www.datastax.com/docs/1.2/references/nodetool)
>>>>>>
>>>>>> When this node is out, rm -rf /yourpath/cassandra/* on this serveur,
>>>>>> change the configuration if needed (not sure about the auto_bootstrap
>>>>>> param) and start Cassandra on that node again. It should join the ring 
>>>>>> as a
>>>>>> new node.
>>>>>>
>>>>>> Good luck.
>>>>>>
>>>>>>
>>>>>> 2013/3/19 Hiller, Dean <dean.hil...@nrel.gov>
>>>>>>
>>>>>> Since you "cleared" out that node, it IS the replacement node.
>>>>>>>
>>>>>>> Dean
>>>>>>>
>>>>>>> From: Jabbar Azam <aja...@gmail.com<mailto:aja...@gmail.com>>
>>>>>>> Reply-To: "user@cassandra.apache.org<mailto:
>>>>>>> user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:
>>>>>>> user@cassandra.apache.org>>
>>>>>>> Date: Tuesday, March 19, 2013 9:29 AM
>>>>>>> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
>>>>>>> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>>>>>>> Subject: Re: Recovering from a faulty cassandra node
>>>>>>>
>>>>>>> Hello Dean.
>>>>>>>
>>>>>>> I'm using vnodes so can't specify a token. In addition I can't
>>>>>>> follow the replace node docs because I don't have a replacement node.
>>>>>>>
>>>>>>>
>>>>>>> On 19 March 2013 15:25, Hiller, Dean <dean.hil...@nrel.gov<mailto:
>>>>>>> dean.hil...@nrel.gov>> wrote:
>>>>>>> I have not done this as of yet but from all that I have read your
>>>>>>> best option is to follow the replace node documentation which I belive 
>>>>>>> you
>>>>>>> need to
>>>>>>>
>>>>>>>
>>>>>>>  1.  Have the token be the same BUT add 1 to it so it doesn't think
>>>>>>> it's the same computer
>>>>>>>  2.  Have the bootstrap option set or something so streaming takes
>>>>>>> affect.
>>>>>>>
>>>>>>> I would however test that all out in QA to make sure it works and if
>>>>>>> you have QUOROM reads/writes a good part of that test would be to take 
>>>>>>> node
>>>>>>> X down after your node Y is back in the cluster to make sure 
>>>>>>> reads/writes
>>>>>>> are working on the node you fixed…..you just need to make sure node X
>>>>>>> shares one of the token ranges of node Y AND your writes/reads are in 
>>>>>>> that
>>>>>>> token range.
>>>>>>>
>>>>>>> Dean
>>>>>>>
>>>>>>> From: Jabbar Azam <aja...@gmail.com<mailto:aja...@gmail.com><mailto:
>>>>>>> aja...@gmail.com<mailto:aja...@gmail.com>>>
>>>>>>> Reply-To: "user@cassandra.apache.org<mailto:
>>>>>>> user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:
>>>>>>> user@cassandra.apache.org>>" <user@cassandra.apache.org<mailto:
>>>>>>> user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:
>>>>>>> user@cassandra.apache.org>>>
>>>>>>> Date: Tuesday, March 19, 2013 8:51 AM
>>>>>>> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org
>>>>>>> ><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>"
>>>>>>> <user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:
>>>>>>> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
>>>>>>> Subject: Recovering from a faulty cassandra node
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I
>>>>>>> waited for over a week to insert lots of data into the cluster. During 
>>>>>>> the
>>>>>>> end of the process one of the nodes had a hardware fault.
>>>>>>>
>>>>>>> I have fixed the hardware fault but the filing system on that node
>>>>>>> is corrupt so I'll have to reinstall the OS and cassandra.
>>>>>>>
>>>>>>> I can think of two ways of reintegrating the host into the cluster
>>>>>>>
>>>>>>> 1) shrink the cluster to three nodes and add the node into the
>>>>>>> cluster
>>>>>>>
>>>>>>> 2) Add the node into the cluster without shrinking
>>>>>>>
>>>>>>> I'm not sure of the best approach to take and I'm not sure how to
>>>>>>> achieve each step.
>>>>>>>
>>>>>>> Can anybody help?
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Thanks
>>>>>>>
>>>>>>>  Jabbar Azam
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Thanks
>>>>>>>
>>>>>>> Jabbar Azam
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks
>>>>>
>>>>> Jabbar Azam
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks
>>>>
>>>> Jabbar Azam
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks
>>>
>>> Jabbar Azam
>>>
>>
>
>
> --
> Thanks
>
> Jabbar Azam
>



-- 
Thanks

Jabbar Azam

Re: Recovering from a faulty cassandra node

Reply via email to