We have encountered issues of very long running nodetool repair when we ran it node by node on really large dataset. It even kept on running for a week in some cases. IMO the strategy you are choosing of repairing nodes by –st and –et is good one and does the same work in small increments logs of which can be analyzed easily.
In addition my suggestion would be to use –h option to connect to the node from outside, and take care of the fact that node tool ring will give even –ve token ranges in the ‘for’ loop. You can go from -2^63 to first ring value, then from (there+1) to next token value. Better not use i+=2 because token values are not necessarily even numbers. Regards, Tarun From: Anuj Wadehra [mailto:anujw_2...@yahoo.co.in] Sent: Sunday, May 24, 2015 6:31 AM To: user@cassandra.apache.org Subject: Re: Periodic Anti-Entropy repair You should use nodetool repair -pr on every node to make sure that each range is repaired only once. Thanks Anuj Wadehra Sent from Yahoo Mail on Android<https://overview.mail.yahoo.com/mobile/?.src=Android> ________________________________ From:"Brice Argenson" <bargen...@gmail.com<mailto:bargen...@gmail.com>> Date:Sat, 23 May, 2015 at 12:31 am Subject:Periodic Anti-Entropy repair Hi everyone, We are currently migrating from DSE to Apache Cassandra and we would like to put in place an automatic and periodic nodetool repair execution to replace the one executed by OpsCenter. I wanted to create a script / service that would run something like that: token_rings = `nodetool ring | awk '{print $8}’` for(int i = 0; i < token_rings.length; i += 2) { `nodetool repair -st token_rings[i] -et token_rings[i+1]` } That script / service would run every week (our GCGrace is 10 days) and would repair all the ranges of the ring one by one. I also looked a bit on Google and I found that script: https://github.com/BrianGallew/cassandra_range_repair It seems to do something equivalent but it also seems to run the repair node by node instead of the complete ring. From my understanding, that would mean that the script has to be run for every node of the cluster and that all token ranges would be repair as many time as the number of replicas containing it. Is there something I misunderstand? Which approach is better? How do you handle your Periodic Anti-Entropy Repairs? Thanks a lot!