Re: Cassandra 4.0 hanging on restart

2022-01-27 Thread Bowen Song
Hi Paul, It's only problematic if you are trying to do *a lots of* subrange incremental repairs. The whole point of incremental repair is each repair is incremental and it will only touch the recently changed data, therefore you shouldn't need to split each node into too many subranges to re

Re: Cassandra 4.0 hanging on restart

2022-01-27 Thread Paul Chandler
Thanks Erick and Bowen I do find all the different parameters for repairs confusing, and even reading up on it now, I see Datastax warns against incremental repairs with -pr, but then the code here seems to negate the need for this warning. Anyway running it like this, produces data in the syst

Re: Cassandra 4.0 hanging on restart

2022-01-27 Thread Bowen Song
Hi Erick, From the source code: https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/service/StorageService.java#L4042 The -pr option has no effect if -st and -et are specified. Therefore, the command results in an incremental repair

Re: Cassandra 4.0 hanging on restart

2022-01-26 Thread Erick Ramirez
I just came across this thread and noted that you're running repairs with -pr which are not incremental repairs. Was that a typo? Cheers!

Re: Cassandra 4.0 hanging on restart

2022-01-26 Thread Bowen Song
Yes, I understand that exposing JMX securely outside localhost involves a fair amount of work. The sidecar option will use more RAM per Cassandra node, which may or may not be an issue depending on the individual circumstance. It is not a quick win if neither of those two options are optimal fo

Re: Cassandra 4.0 hanging on restart

2022-01-26 Thread Paul Chandler
We don’t expose the JMX port outside localhost, so last time we looked it was not possible, I see now there is the sidecar option, but that sounds like there are number of caveats, particularly around resources, that may cause some issues with our setup. So at the moment reaper does not seem lik

Re: Cassandra 4.0 hanging on restart

2022-01-26 Thread Bowen Song
I'm glad that it fixed the problem. Now, may I interest you with Cassandra Reaper ? In my experience it has managed the load fairly well on large clusters. On 26/01/2022 10:19, Paul Chandler wrote: I changed the the range repair to be full repair, reset the repaired

Re: Cassandra 4.0 hanging on restart

2022-01-26 Thread Paul Chandler
I changed the the range repair to be full repair, reset the repairedAt for all SSTables and deleted the old data out of the system.repairs table. This then did not create any new rows in the system.repairs table, and the node was able to restart without any problem, so this seems to be a soluti

Re: Cassandra 4.0 hanging on restart

2022-01-25 Thread Bowen Song
That would indicate the "isSuperseded(session)" call returned false. After looking at the source code, it seems the subrange incremental repair is likely causing this. Would you mind to try either subrange full repair or full range incremental repair? You may need to reset the "repairedAt" val

Re: Cassandra 4.0 hanging on restart

2022-01-25 Thread Paul Chandler
Hi Bowen, Yes there are a large number of "Skipping delete of FINALIZED LocalSession” messages. We have a script that repairs ranges, stepping through the complete range in 5 days, this should create 1600 ranges over the 5 days, this runs commands like this: nodetool -h localhost -p 7199 repa

Re: Cassandra 4.0 hanging on restart

2022-01-24 Thread Bowen Song
From the source code I've read, by default Cassandra will run a clean up for the system.repairs table every 10 minutes, any row related to a repair that has completed over 1 day ago will be automatically removed. I highly doubt that you have ran 75,000 repairs in the 24 hours prior to shutting

Re: Cassandra 4.0 hanging on restart

2022-01-24 Thread Paul Chandler
Hi Bowen, Yes, there does seem to be a lot of rows, on one of the upgraded clusters there 75,000 rows. I have been experimenting on a test cluster, this has about a 5 minute pause, and around 15,000 rows. If I clear the system.repairs table ( by deleting the sstables ) then this does not pau

Re: Cassandra 4.0 hanging on restart

2022-01-24 Thread Bowen Song
Hmm, interesting... Try "select * from system.repairs;" in cqlsh on a slow starting node, do you get a lots of rows? This is the most obvious loop run (indirectly) by the ActiveRepairService.start(). On 24/01/2022 13:30, Romain Anselin wrote: Hi everyone, We generated a JFR profile of the st

Re: Cassandra 4.0 hanging on restart

2022-01-24 Thread Romain Anselin
Hi everyone, We generated a JFR profile of the startup phase of Cassandra with Paul, and it would appear that the time is spent in the ActiveRepairSession within the main thread (11mn of execution of the "main" thread in his environment, vs 15s in mine), which has been introduced in CASSANDRA-

Re: Cassandra 4.0 hanging on restart

2022-01-19 Thread Paul Chandler
Hi Bowen, Thanks for the reply, these have been our normal shutdowns, so we do a nodetool drain before restarting the service, so I would have thought there should not be any commtlogs However there is these messages for one commit log, But looks like it has finished quickly and correctly: IN

Re: Cassandra 4.0 hanging on restart

2022-01-19 Thread Bowen Song
Nothing obvious from the logs you posted. Generally speaking, replaying commit log is often the culprit when a node takes a long time to start. I have seen many nodes with large memtable and commit log size limit spending over half an hour replaying the commit log. I usually do a "nodetool flu

Cassandra 4.0 hanging on restart

2022-01-19 Thread Paul Chandler
Hi all, We have upgraded a couple of clusters from 3.11.6, now we are having issues when we restart the nodes. The node will either hang or take 10-30 minute to restart, these are the last messages we have in the system.log: INFO [NonPeriodicTasks:1] 2022-01-19 10:08:23,267 FileUtils.java:54