Re: Cassandra 4.0 hanging on restart

Paul Chandler Wed, 26 Jan 2022 03:05:53 -0800

We don’t expose the JMX port outside localhost, so last time we looked it was 
not possible, I see now there is the sidecar option, but that sounds like there 
are number of caveats, particularly around resources, that may cause some 
issues with our setup. So at the moment reaper does not seem like a quick win 
for us.


Thanks 

Paul

 

> On 26 Jan 2022, at 10:33, Bowen Song <bo...@bso.ng> wrote:
> 
> I'm glad that it fixed the problem. Now, may I interest you with Cassandra 
> Reaper <http://cassandra-reaper.io/>? In my experience it has managed the 
> load fairly well on large clusters.
> 
> On 26/01/2022 10:19, Paul Chandler wrote:
>> I changed the the range repair to be full repair, reset the repairedAt for 
>> all SSTables and deleted the old data out of the system.repairs table. 
>> 
>> This then did not create any new rows in the system.repairs table, and the 
>> node was able to restart without any problem, so this seems to be a solution
>> 
>> I am concerned about turning on full repair for some of our cluster and what 
>> that will do to the load, so I am now going to experiment with larger 
>> incremental range repairs to see if there is a sweet spot where this works 
>> ok.
>> 
>> Paul
>> 
>>> On 25 Jan 2022, at 13:36, Bowen Song <bo...@bso.ng <mailto:bo...@bso.ng>> 
>>> wrote:
>>> 
>>> That would indicate the "isSuperseded(session)" call returned false. After 
>>> looking at the source code, it seems the subrange incremental repair is 
>>> likely causing this.
>>> 
>>> Would you mind to try either subrange full repair or full range incremental 
>>> repair? You may need to reset the "repairedAt" value in all SSTables using 
>>> the "sstablerepairedset" tool if you decide to move on to use subrange full 
>>> repairs.
>>> 
>>> 
>>> 
>>> On 25/01/2022 12:39, Paul Chandler wrote:
>>>> Hi Bowen,
>>>> 
>>>> Yes there are a large number of "Skipping delete of FINALIZED 
>>>> LocalSession” messages.
>>>> 
>>>> We have a script that repairs ranges, stepping through the complete range 
>>>> in 5 days, this should create 1600 ranges over the 5 days, this runs 
>>>> commands like this:
>>>> 
>>>> nodetool -h localhost -p 7199 repair -pr  -st +09152150533683801432 -et 
>>>> +09154639946886262655
>>>> 
>>>> I am also seeing lots of "Auto deleting repair session LocalSession” 
>>>> messages - these seem to be deleting the rows with a repairedAt value of 
>>>> more than 5 days, so it seems like that part is working correctly, but 
>>>> just taking 5 days to delete them. 
>>>> 
>>>> Thanks 
>>>> 
>>>> Paul
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 24 Jan 2022, at 22:12, Bowen Song <bo...@bso.ng <mailto:bo...@bso.ng>> 
>>>>> wrote:
>>>>> 
>>>>> From the source code I've read, by default Cassandra will run a clean up 
>>>>> for the system.repairs table every 10 minutes, any row related to a 
>>>>> repair that has completed over 1 day ago will be automatically removed. I 
>>>>> highly doubt that you have ran 75,000 repairs in the 24 hours prior to 
>>>>> shutting down that node, because that's nearly one repair every second.
>>>>> 
>>>>> Do you see any logs like these?
>>>>> 
>>>>> Auto failing timed out repair session...
>>>>> Skipping delete of FINALIZED LocalSession ... because it has not been 
>>>>> superseded by a more recent session
>>>>> Skipping delete of LocalSession ... because it still contains sstables
>>>>> They are the logs from the cleanup() method in 
>>>>> https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L416
>>>>>  
>>>>> <https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L416>
>>>>>  which indicates a record was not deleted during the cleaned up for a 
>>>>> number of reasons.
>>>>> 
>>>>> On 24/01/2022 19:45, Paul Chandler wrote:
>>>>>> Hi Bowen,
>>>>>> 
>>>>>> Yes, there does seem to be a lot of rows, on one of the upgraded 
>>>>>> clusters there 75,000 rows.
>>>>>> 
>>>>>> I have been experimenting on a test cluster, this has about a 5 minute 
>>>>>> pause, and around 15,000 rows. 
>>>>>> 
>>>>>> If I clear the system.repairs table ( by deleting the sstables ) then 
>>>>>> this does not pause at all, so seems to fix the problem. However I don’t 
>>>>>> really understand what the implications are of just removing that data.
>>>>>> 
>>>>>> Thanks 
>>>>>> 
>>>>>> Paul
>>>>>> 
>>>>>>> On 24 Jan 2022, at 18:50, Bowen Song <bo...@bso.ng 
>>>>>>> <mailto:bo...@bso.ng>> wrote:
>>>>>>> 
>>>>>>> Hmm, interesting... Try "select * from system.repairs;" in cqlsh on a 
>>>>>>> slow starting node, do you get a lots of rows? This is the most obvious 
>>>>>>> loop run (indirectly) by the ActiveRepairService.start(). 
>>>>>>> 
>>>>>>> On 24/01/2022 13:30, Romain Anselin wrote:
>>>>>>>> 
>>>>>>>> Hi everyone,
>>>>>>>> 
>>>>>>>> We generated a JFR profile of the startup phase of Cassandra with 
>>>>>>>> Paul, and it would appear that the time is spent in the 
>>>>>>>> ActiveRepairSession within the main thread (11mn of execution of the 
>>>>>>>> "main" thread in his environment, vs 15s in mine), which has been 
>>>>>>>> introduced in CASSANDRA-9143 based on a "view git blame" of the source 
>>>>>>>> code
>>>>>>>> https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/service/CassandraDaemon.java#L381
>>>>>>>>  
>>>>>>>> <https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/service/CassandraDaemon.java#L381>
>>>>>>>> 
>>>>>>>> That seem to match with the gap we see in the logs, where the time is 
>>>>>>>> spent just before the "Preloaded" statement in the logs which comes 
>>>>>>>> just after in the CassandraDaemon code.
>>>>>>>> 
>>>>>>>> INFO  [main] 2022-01-20 09:44:40,198  StorageService.java:830 - Token 
>>>>>>>> metadata: Normal Tokens:
>>>>>>>> ... <skipping non "main" thread logs> ...
>>>>>>>> WARN  [Messaging-EventLoop-3-1] 2022-01-20 09:45:13,243  
>>>>>>>> NoSpamLogger.java:95 - 
>>>>>>>> /IP1:7000->/IP1:7000-SMALL_MESSAGES-[no-channel] dropping message of 
>>>>>>>> type SCHEMA_VERSION_REQ whose timeout expired before reaching the 
>>>>>>>> network
>>>>>>>> INFO  [main] 2022-01-20 09:55:01,134  QueryProcessor.java:150 - 
>>>>>>>> Preloaded 0 prepared statements
>>>>>>>> 
>>>>>>>> Remains to determine what that is doing more in details and why it's 
>>>>>>>> taking longer and longer on startup. 
>>>>>>>> 
>>>>>>>> We also exported the sstablemetadata from all sstables in one node, 
>>>>>>>> and at this stage, we can see we have 300 sstables out of 577 with 
>>>>>>>> "Repaired at:" set.
>>>>>>>> 
>>>>>>>> cd /var/lib/cassandra/data
>>>>>>>> find . -name '*Data*' | while read datf; do echo ++++ $datf ++++; sudo 
>>>>>>>> -u cassandra sstablemetadata $datf; done >> ~/sstablemetadata.txt
>>>>>>>> cqlsh -e "paging off; select * from system.repairs" >> ~/repairs.out
>>>>>>>> 
>>>>>>>> $ egrep 'Repaired at: 1' sstablemetadata.txt | wc -l
>>>>>>>> 300
>>>>>>>> $ egrep 'Repaired at:' sstablemetadata.txt | wc -l
>>>>>>>> 577
>>>>>>>> 
>>>>>>>> More info to come
>>>>>>>> 
>>>>>>>> Regards - Romain
>>>>>>>> 
>>>>>>>> On 19/01/2022 13:10, Paul Chandler wrote:
>>>>>>>>> Hi Bowen,
>>>>>>>>> 
>>>>>>>>> Thanks for the reply, these have been our normal shutdowns, so we do 
>>>>>>>>> a nodetool drain before restarting the service, so I would have 
>>>>>>>>> thought there should not be any commtlogs
>>>>>>>>> 
>>>>>>>>> However there is these messages for one commit log, But looks like it 
>>>>>>>>> has finished quickly and correctly:
>>>>>>>>> 
>>>>>>>>> INFO  [main] 2022-01-19 10:08:22,811  CommitLog.java:173 - Replaying 
>>>>>>>>> /var/lib/cassandra/commitlog/CommitLog-7-1642094921295.log
>>>>>>>>> WARN  [main] 2022-01-19 10:08:22,839  CommitLogReplayer.java:305 - 
>>>>>>>>> Origin of 2 sstables is unknown or doesn't match the local node; 
>>>>>>>>> commitLogIntervals for them were ignored
>>>>>>>>> Repeated about 10 times
>>>>>>>>> WARN  [main] 2022-01-19 10:08:22,842  CommitLogReplayer.java:305 - 
>>>>>>>>> Origin of 3 sstables is unknown or doesn't match the local node; 
>>>>>>>>> commitLogIntervals for them were ignored
>>>>>>>>> INFO  [main] 2022-01-19 10:08:22,853  CommitLogReader.java:256 - 
>>>>>>>>> Finished reading 
>>>>>>>>> /var/lib/cassandra/commitlog/CommitLog-7-1642094921295.log
>>>>>>>>> INFO  [main] 2022-01-19 10:08:22,882  CommitLog.java:175 - Log replay 
>>>>>>>>> complete, 0 replayed mutations 
>>>>>>>>> 
>>>>>>>>> Thanks 
>>>>>>>>> 
>>>>>>>>> Paul
>>>>>>>>> 
>>>>>>>>>> On 19 Jan 2022, at 13:03, Bowen Song <bo...@bso.ng> 
>>>>>>>>>> <mailto:bo...@bso.ng> wrote:
>>>>>>>>>> 
>>>>>>>>>> Nothing obvious from the logs you posted.
>>>>>>>>>> 
>>>>>>>>>> Generally speaking, replaying commit log is often the culprit when a 
>>>>>>>>>> node takes a long time to start. I have seen many nodes with large 
>>>>>>>>>> memtable and commit log size limit spending over half an hour 
>>>>>>>>>> replaying the commit log. I usually do a "nodetool flush" before 
>>>>>>>>>> shutting down the node to help speed up the start time if the 
>>>>>>>>>> shutdown was planned. There isn't much you can do about unexpected 
>>>>>>>>>> shutdown, such as server crashes. When that happens, the only 
>>>>>>>>>> reasonable thing to do is wait for the commit log replay to finish. 
>>>>>>>>>> You should see log entries related to replaying commit logs if this 
>>>>>>>>>> is the case.
>>>>>>>>>> 
>>>>>>>>>> However, if you don't find any logs related to replaying commit 
>>>>>>>>>> logs, the cause may be completely different.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 19/01/2022 11:54, Paul Chandler wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>> 
>>>>>>>>>>> We have upgraded a couple of clusters from 3.11.6, now we are 
>>>>>>>>>>> having issues when we restart the nodes.
>>>>>>>>>>> 
>>>>>>>>>>> The node will either hang or take 10-30 minute to restart, these 
>>>>>>>>>>> are the last messages we have in the system.log:
>>>>>>>>>>> 
>>>>>>>>>>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,267  
>>>>>>>>>>> FileUtils.java:545 - Deleting file during startup: 
>>>>>>>>>>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-184-big-Summary.db
>>>>>>>>>>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,268  
>>>>>>>>>>> LogTransaction.java:240 - Unfinished transaction log, deleting 
>>>>>>>>>>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-185-big-Data.db
>>>>>>>>>>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,268  
>>>>>>>>>>> FileUtils.java:545 - Deleting file during startup: 
>>>>>>>>>>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-185-big-Summary.db
>>>>>>>>>>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,269  
>>>>>>>>>>> LogTransaction.java:240 - Unfinished transaction log, deleting 
>>>>>>>>>>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-186-big-Data.db
>>>>>>>>>>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,270  
>>>>>>>>>>> FileUtils.java:545 - Deleting file during startup: 
>>>>>>>>>>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-186-big-Summary.db
>>>>>>>>>>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,272  
>>>>>>>>>>> LogTransaction.java:240 - Unfinished transaction log, deleting 
>>>>>>>>>>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb_txn_unknowncompactiontype_bc501d00-790f-11ec-9f80-85
>>>>>>>>>>> 8854746758.log
>>>>>>>>>>> INFO  [MemtableFlushWriter:2] 2022-01-19 10:08:23,289  
>>>>>>>>>>> LogTransaction.java:240 - Unfinished transaction log, deleting 
>>>>>>>>>>> /var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb_txn_flush_bc52dc20-790f-11ec-9f80-858854746758.log
>>>>>>>>>>> 
>>>>>>>>>>> The debug log has messages from DiskBoundaryManager.java at the 
>>>>>>>>>>> same time, then it just has the following messages:||
>>>>>>>>>>> 
>>>>>>>>>>> DEBUG [ScheduledTasks:1] 2022-01-19 10:28:09,430  
>>>>>>>>>>> SSLFactory.java:354 - Checking whether certificates have been 
>>>>>>>>>>> updated []
>>>>>>>>>>> DEBUG [ScheduledTasks:1] 2022-01-19 10:38:09,431  
>>>>>>>>>>> SSLFactory.java:354 - Checking whether certificates have been 
>>>>>>>>>>> updated []
>>>>>>>>>>> DEBUG [ScheduledTasks:1] 2022-01-19 10:48:09,431  
>>>>>>>>>>> SSLFactory.java:354 - Checking whether certificates have been 
>>>>>>>>>>> updated []
>>>>>>>>>>> DEBUG [ScheduledTasks:1] 2022-01-19 10:58:09,431  
>>>>>>>>>>> SSLFactory.java:354 - Checking whether certificates have been 
>>>>>>>>>>> updated []
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> It seems to get worse after each restart, and then it gets to the 
>>>>>>>>>>> state where it just hangs, then the only thing to do is to re 
>>>>>>>>>>> bootstrap the node.
>>>>>>>>>>> 
>>>>>>>>>>> Once I had re bootstrapped all the nodes in the cluster, I thought 
>>>>>>>>>>> the cluster was stable, but I have now got the case where the one 
>>>>>>>>>>> of the nodes is hanging again.
>>>>>>>>>>> 
>>>>>>>>>>> Does anyone have an ideas what is causing the problems ?
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Thanks
>>>>>>>>>>> 
>>>>>>>>>>> Paul Chandler
>>>>>>>>>>> 
>>>>>> 
>>>> 
>>

Re: Cassandra 4.0 hanging on restart

Reply via email to