Hi Paul,

It's only problematic if you are trying to do *a lots of* subrange incremental repairs. The whole point of incremental repair is each repair is incremental and it will only touch the recently changed data, therefore you shouldn't need to split each node into too many subranges to reduce the required time and/or resources. For example, If you split each node into 16 subranges (e.g. 16 vnodes), on a cluster with a single DC and RF=3, only 48 rows will stay in the system.repairs table on each node. I don't see 48 rows will cause any noticeable performance impact.

Getting repair right is hard, which is why I recommended Cassandra Reaper. The cron job route will suffer from a lots of issues. For example, because you can't run repair with -st and -et cross the vnode boundary, each time you add or remove a node you will need to update the ranges not only on the newly added/removed node, but also on many other nodes. The amount of work to implement a script that does the token range auto-discovery is simply not worth the effort when you could spent the same time to secure the remote JMX and setup Cassandra Reaper instead.


Regards,

Bowen


On 27/01/2022 11:46, Paul Chandler wrote:
Thanks Erick and Bowen

I do find all the different parameters for repairs confusing, and even reading up on it now, I see Datastax warns against incremental repairs with -pr, but then the code here seems to negate the need for this warning.

Anyway running it like this, produces data in the system.repairs table, so I assume it is doing incremental repairs.

nodetool -h localhost -p 7199 repair -pr  -st +02596488670266845384 -et +02613877898679419724

Then running it like this produces no data in the table, so again assuming that means it is full repairs

nodetool -h localhost -p 7199 repair -pr -full  -st +02596488670266845384 -et +02613877898679419724

Yesterday I recompiled the Cassandra 4.0.0 code with extra logging in the following method

https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L338

This showed me that the extra 10 minutes ( and more ) on some clusters is being taken up in the for loop reading the rows from the system.repairs table.

So this does seem to be issue if you are trying to do incremental range repairs in 4.0

Thanks

Paul

On 27 Jan 2022, at 10:27, Bowen Song <bo...@bso.ng> wrote:

Hi Erick,


From the source code: https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/service/StorageService.java#L4042

The -pr option has no effect if -st and -et are specified. Therefore, the command results in an incremental repair.


Cheers,

Bowen

On 27/01/2022 01:32, Erick Ramirez wrote:
I just came across this thread and noted that you're running repairs with -pr which are not incremental repairs. Was that a typo? Cheers!

Reply via email to