Hi Paul,
It's only problematic if you are trying to do *a lots of* subrange
incremental repairs. The whole point of incremental repair is each
repair is incremental and it will only touch the recently changed data,
therefore you shouldn't need to split each node into too many subranges
to reduce the required time and/or resources. For example, If you split
each node into 16 subranges (e.g. 16 vnodes), on a cluster with a single
DC and RF=3, only 48 rows will stay in the system.repairs table on each
node. I don't see 48 rows will cause any noticeable performance impact.
Getting repair right is hard, which is why I recommended Cassandra
Reaper. The cron job route will suffer from a lots of issues. For
example, because you can't run repair with -st and -et cross the vnode
boundary, each time you add or remove a node you will need to update the
ranges not only on the newly added/removed node, but also on many other
nodes. The amount of work to implement a script that does the token
range auto-discovery is simply not worth the effort when you could spent
the same time to secure the remote JMX and setup Cassandra Reaper instead.
Regards,
Bowen
On 27/01/2022 11:46, Paul Chandler wrote:
Thanks Erick and Bowen
I do find all the different parameters for repairs confusing, and even
reading up on it now, I see Datastax warns against incremental repairs
with -pr, but then the code here seems to negate the need for this
warning.
Anyway running it like this, produces data in the system.repairs
table, so I assume it is doing incremental repairs.
nodetool -h localhost -p 7199 repair -pr -st +02596488670266845384
-et +02613877898679419724
Then running it like this produces no data in the table, so again
assuming that means it is full repairs
nodetool -h localhost -p 7199 repair -pr -full -st
+02596488670266845384 -et +02613877898679419724
Yesterday I recompiled the Cassandra 4.0.0 code with extra logging in
the following method
https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L338
This showed me that the extra 10 minutes ( and more ) on some clusters
is being taken up in the for loop reading the rows from the
system.repairs table.
So this does seem to be issue if you are trying to do incremental
range repairs in 4.0
Thanks
Paul
On 27 Jan 2022, at 10:27, Bowen Song <bo...@bso.ng> wrote:
Hi Erick,
From the source code:
https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/service/StorageService.java#L4042
The -pr option has no effect if -st and -et are specified. Therefore,
the command results in an incremental repair.
Cheers,
Bowen
On 27/01/2022 01:32, Erick Ramirez wrote:
I just came across this thread and noted that you're running repairs
with -pr which are not incremental repairs. Was that a typo? Cheers!