Re: Cassandra 4.0 hanging on restart

Bowen Song Thu, 27 Jan 2022 04:39:05 -0800

Hi Paul,

It's only problematic if you are trying to do *a lots of* subrangeincremental repairs. The whole point of incremental repair is eachrepair is incremental and it will only touch the recently changed data,therefore you shouldn't need to split each node into too many subrangesto reduce the required time and/or resources. For example, If you spliteach node into 16 subranges (e.g. 16 vnodes), on a cluster with a singleDC and RF=3, only 48 rows will stay in the system.repairs table on eachnode. I don't see 48 rows will cause any noticeable performance impact.

Getting repair right is hard, which is why I recommended CassandraReaper. The cron job route will suffer from a lots of issues. Forexample, because you can't run repair with -st and -et cross the vnodeboundary, each time you add or remove a node you will need to update theranges not only on the newly added/removed node, but also on many othernodes. The amount of work to implement a script that does the tokenrange auto-discovery is simply not worth the effort when you could spentthe same time to secure the remote JMX and setup Cassandra Reaper instead.



Regards,

Bowen


On 27/01/2022 11:46, Paul Chandler wrote:

Thanks Erick and Bowen
I do find all the different parameters for repairs confusing, and evenreading up on it now, I see Datastax warns against incremental repairswith -pr, but then the code here seems to negate the need for thiswarning.
Anyway running it like this, produces data in the system.repairstable, so I assume it is doing incremental repairs.
nodetool -h localhost -p 7199 repair -pr -st +02596488670266845384-et +02613877898679419724
Then running it like this produces no data in the table, so againassuming that means it is full repairs
nodetool -h localhost -p 7199 repair -pr -full -st+02596488670266845384 -et +02613877898679419724
Yesterday I recompiled the Cassandra 4.0.0 code with extra logging inthe following method
https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L338
This showed me that the extra 10 minutes ( and more ) on some clustersis being taken up in the for loop reading the rows from thesystem.repairs table.
So this does seem to be issue if you are trying to do incrementalrange repairs in 4.0
Thanks

Paul
On 27 Jan 2022, at 10:27, Bowen Song <bo...@bso.ng> wrote:

Hi Erick,
From the source code:https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/service/StorageService.java#L4042
The -pr option has no effect if -st and -et are specified. Therefore,the command results in an incremental repair.
Cheers,

Bowen

On 27/01/2022 01:32, Erick Ramirez wrote:
I just came across this thread and noted that you're running repairswith -pr which are not incremental repairs. Was that a typo? Cheers!

Re: Cassandra 4.0 hanging on restart

Reply via email to