We have encountered issues of very long running nodetool repair when we ran it 
node by node on really large dataset. It even kept on running for a week in 
some cases.
IMO the strategy you are choosing of repairing nodes by –st and –et is good one 
and does the same work in small increments logs of which can be analyzed easily.

In addition my suggestion would be to use –h option to connect to the node from 
outside, and take care of the fact that node tool ring will give even –ve token 
ranges in the ‘for’ loop. You can go from -2^63 to first ring value, then from 
(there+1) to next token value. Better not use i+=2 because token values are not 
necessarily even numbers.

Regards,
Tarun

From: Anuj Wadehra [mailto:anujw_2...@yahoo.co.in]
Sent: Sunday, May 24, 2015 6:31 AM
To: user@cassandra.apache.org
Subject: Re: Periodic Anti-Entropy repair

You should use nodetool repair -pr on every node to make sure that each range 
is repaired only once.


Thanks
Anuj Wadehra

Sent from Yahoo Mail on 
Android<https://overview.mail.yahoo.com/mobile/?.src=Android>

________________________________
From:"Brice Argenson" <bargen...@gmail.com<mailto:bargen...@gmail.com>>
Date:Sat, 23 May, 2015 at 12:31 am
Subject:Periodic Anti-Entropy repair
Hi everyone,

We are currently migrating from DSE to Apache Cassandra and we would like to 
put in place an automatic and periodic nodetool repair execution to replace the 
one executed by OpsCenter.

I wanted to create a script / service that would run something like that:

token_rings = `nodetool ring | awk '{print $8}’`
for(int i = 0; i < token_rings.length; i += 2) {
   `nodetool repair -st token_rings[i] -et token_rings[i+1]`
}

That script / service would run every week (our GCGrace is 10 days) and would 
repair all the ranges of the ring one by one.

I also looked a bit on Google and I found that script: 
https://github.com/BrianGallew/cassandra_range_repair
It seems to do something equivalent but it also seems to run the repair node by 
node instead of the complete ring.
From my understanding, that would mean that the script has to be run for every 
node of the cluster and that all token ranges would be repair as many time as 
the number of replicas containing it.


Is there something I misunderstand?
Which approach is better?
How do you handle your Periodic Anti-Entropy Repairs?


Thanks a lot!



Reply via email to