Re: Cassandra 4.0 hanging on restart

Paul Chandler Tue, 25 Jan 2022 04:39:55 -0800

Hi Bowen,

Yes there are a large number of "Skipping delete of FINALIZED LocalSession” 
messages.


We have a script that repairs ranges, stepping through the complete range in 5 
days, this should create 1600 ranges over the 5 days, this runs commands like 
this:

nodetool -h localhost -p 7199 repair -pr  -st +09152150533683801432 -et 
+09154639946886262655

I am also seeing lots of "Auto deleting repair session LocalSession” messages - 
these seem to be deleting the rows with a repairedAt value of more than 5 days, 
so it seems like that part is working correctly, but just taking 5 days to 
delete them. 

Thanks 

Paul




> On 24 Jan 2022, at 22:12, Bowen Song <bo...@bso.ng> wrote:
> 
> From the source code I've read, by default Cassandra will run a clean up for 
> the system.repairs table every 10 minutes, any row related to a repair that 
> has completed over 1 day ago will be automatically removed. I highly doubt 
> that you have ran 75,000 repairs in the 24 hours prior to shutting down that 
> node, because that's nearly one repair every second.
> 
> Do you see any logs like these?
> 
> Auto failing timed out repair session...
> Skipping delete of FINALIZED LocalSession ... because it has not been 
> superseded by a more recent session
> Skipping delete of LocalSession ... because it still contains sstables
> They are the logs from the cleanup() method in 
> https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L416
>  
> <https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L416>
>  which indicates a record was not deleted during the cleaned up for a number 
> of reasons.
> 
> On 24/01/2022 19:45, Paul Chandler wrote:
>> Hi Bowen,
>> 
>> Yes, there does seem to be a lot of rows, on one of the upgraded clusters 
>> there 75,000 rows.
>> 
>> I have been experimenting on a test cluster, this has about a 5 minute 
>> pause, and around 15,000 rows. 
>> 
>> If I clear the system.repairs table ( by deleting the sstables ) then this 
>> does not pause at all, so seems to fix the problem. However I don’t really 
>> understand what the implications are of just removing that data.
>> 
>> Thanks 
>> 
>> Paul
>> 
>>> On 24 Jan 2022, at 18:50, Bowen Song <bo...@bso.ng <mailto:bo...@bso.ng>> 
>>> wrote:
>>> 
>>> Hmm, interesting... Try "select * from system.repairs;" in cqlsh on a slow 
>>> starting node, do you get a lots of rows? This is the most obvious loop run 
>>> (indirectly) by the ActiveRepairService.start(). 
>>> 
>>> On 24/01/2022 13:30, Romain Anselin wrote:
>>>> 
>>>> Hi everyone,
>>>> 
>>>> We generated a JFR profile of the startup phase of Cassandra with Paul, 
>>>> and it would appear that the time is spent in the ActiveRepairSession 
>>>> within the main thread (11mn of execution of the "main" thread in his 
>>>> environment, vs 15s in mine), which has been introduced in CASSANDRA-9143 
>>>> based on a "view git blame" of the source code
>>>> https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/service/CassandraDaemon.java#L381
>>>>  
>>>> <https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/service/CassandraDaemon.java#L381>
>>>> 
>>>> That seem to match with the gap we see in the logs, where the time is 
>>>> spent just before the "Preloaded" statement in the logs which comes just 
>>>> after in the CassandraDaemon code.
>>>> 
>>>> INFO  [main] 2022-01-20 09:44:40,198  StorageService.java:830 - Token 
>>>> metadata: Normal Tokens:
>>>> ... <skipping non "main" thread logs> ...
>>>> WARN  [Messaging-EventLoop-3-1] 2022-01-20 09:45:13,243  
>>>> NoSpamLogger.java:95 - /IP1:7000->/IP1:7000-SMALL_MESSAGES-[no-channel] 
>>>> dropping message of type SCHEMA_VERSION_REQ whose timeout expired before 
>>>> reaching the network
>>>> INFO  [main] 2022-01-20 09:55:01,134  QueryProcessor.java:150 - Preloaded 
>>>> 0 prepared statements
>>>> 
>>>> Remains to determine what that is doing more in details and why it's 
>>>> taking longer and longer on startup. 
>>>> 
>>>> We also exported the sstablemetadata from all sstables in one node, and at 
>>>> this stage, we can see we have 300 sstables out of 577 with "Repaired at:" 
>>>> set.
>>>> 
>>>> cd /var/lib/cassandra/data
>>>> find . -name '*Data*' | while read datf; do echo ++++ $datf ++++; sudo -u 
>>>> cassandra sstablemetadata $datf; done >> ~/sstablemetadata.txt
>>>> cqlsh -e "paging off; select * from system.repairs" >> ~/repairs.out
>>>> 
>>>> $ egrep 'Repaired at: 1' sstablemetadata.txt | wc -l
>>>> 300
>>>> $ egrep 'Repaired at:' sstablemetadata.txt | wc -l
>>>> 577
>>>> 
>>>> More info to come
>>>> 
>>>> Regards - Romain
>>>> 
>>>> On 19/01/2022 13:10, Paul Chandler wrote:
>>>>> Hi Bowen,
>>>>> 
>>>>> Thanks for the reply, these have been our normal shutdowns, so we do a 
>>>>> nodetool drain before restarting the service, so I would have thought 
>>>>> there should not be any commtlogs
>>>>> 
>>>>> However there is these messages for one commit log, But looks like it has 
>>>>> finished quickly and correctly:
>>>>> 
>>>>> INFO  [main] 2022-01-19 10:08:22,811  CommitLog.java:173 - Replaying 
>>>>> /var/lib/cassandra/commitlog/CommitLog-7-1642094921295.log
>>>>> WARN  [main] 2022-01-19 10:08:22,839  CommitLogReplayer.java:305 - Origin 
>>>>> of 2 sstables is unknown or doesn't match the local node; 
>>>>> commitLogIntervals for them were ignored
>>>>> Repeated about 10 times
>>>>> WARN  [main] 2022-01-19 10:08:22,842  CommitLogReplayer.java:305 - Origin 
>>>>> of 3 sstables is unknown or doesn't match the local node; 
>>>>> commitLogIntervals for them were ignored
>>>>> INFO  [main] 2022-01-19 10:08:22,853  CommitLogReader.java:256 - Finished 
>>>>> reading /var/lib/cassandra/commitlog/CommitLog-7-1642094921295.log
>>>>> INFO  [main] 2022-01-19 10:08:22,882  CommitLog.java:175 - Log replay 
>>>>> complete, 0 replayed mutations 
>>>>> 
>>>>> Thanks 
>>>>> 
>>>>> Paul
>>>>> 
>>>>>> On 19 Jan 2022, at 13:03, Bowen Song <bo...@bso.ng> 
>>>>>> <mailto:bo...@bso.ng> wrote:
>>>>>> 
>>>>>> Nothing obvious from the logs you posted.
>>>>>> 
>>>>>> Generally speaking, replaying commit log is often the culprit when a 
>>>>>> node takes a long time to start. I have seen many nodes with large 
>>>>>> memtable and commit log size limit spending over half an hour replaying 
>>>>>> the commit log. I usually do a "nodetool flush" before shutting down the 
>>>>>> node to help speed up the start time if the shutdown was planned. There 
>>>>>> isn't much you can do about unexpected shutdown, such as server crashes. 
>>>>>> When that happens, the only reasonable thing to do is wait for the 
>>>>>> commit log replay to finish. You should see log entries related to 
>>>>>> replaying commit logs if this is the case.
>>>>>> 
>>>>>> However, if you don't find any logs related to replaying commit logs, 
>>>>>> the cause may be completely different.
>>>>>> 
>>>>>> 
>>>>>> On 19/01/2022 11:54, Paul Chandler wrote:
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> We have upgraded a couple of clusters from 3.11.6, now we are having 
>>>>>>> issues when we restart the nodes.
>>>>>>> 
>>>>>>> The node will either hang or take 10-30 minute to restart, these are 
>>>>>>> the last messages we have in the system.log:
>>>>>>> 
>>>>>>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,267  FileUtils.java:545 
>>>>>>> - Deleting file during startup: 
>>>>>>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-184-big-Summary.db
>>>>>>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,268  
>>>>>>> LogTransaction.java:240 - Unfinished transaction log, deleting 
>>>>>>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-185-big-Data.db
>>>>>>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,268  FileUtils.java:545 
>>>>>>> - Deleting file during startup: 
>>>>>>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-185-big-Summary.db
>>>>>>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,269  
>>>>>>> LogTransaction.java:240 - Unfinished transaction log, deleting 
>>>>>>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-186-big-Data.db
>>>>>>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,270  FileUtils.java:545 
>>>>>>> - Deleting file during startup: 
>>>>>>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-186-big-Summary.db
>>>>>>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,272  
>>>>>>> LogTransaction.java:240 - Unfinished transaction log, deleting 
>>>>>>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb_txn_unknowncompactiontype_bc501d00-790f-11ec-9f80-85
>>>>>>> 8854746758.log
>>>>>>> INFO  [MemtableFlushWriter:2] 2022-01-19 10:08:23,289  
>>>>>>> LogTransaction.java:240 - Unfinished transaction log, deleting 
>>>>>>> /var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb_txn_flush_bc52dc20-790f-11ec-9f80-858854746758.log
>>>>>>> 
>>>>>>> The debug log has messages from DiskBoundaryManager.java at the same 
>>>>>>> time, then it just has the following messages:||
>>>>>>> 
>>>>>>> DEBUG [ScheduledTasks:1] 2022-01-19 10:28:09,430  SSLFactory.java:354 - 
>>>>>>> Checking whether certificates have been updated []
>>>>>>> DEBUG [ScheduledTasks:1] 2022-01-19 10:38:09,431  SSLFactory.java:354 - 
>>>>>>> Checking whether certificates have been updated []
>>>>>>> DEBUG [ScheduledTasks:1] 2022-01-19 10:48:09,431  SSLFactory.java:354 - 
>>>>>>> Checking whether certificates have been updated []
>>>>>>> DEBUG [ScheduledTasks:1] 2022-01-19 10:58:09,431  SSLFactory.java:354 - 
>>>>>>> Checking whether certificates have been updated []
>>>>>>> 
>>>>>>> 
>>>>>>> It seems to get worse after each restart, and then it gets to the state 
>>>>>>> where it just hangs, then the only thing to do is to re bootstrap the 
>>>>>>> node.
>>>>>>> 
>>>>>>> Once I had re bootstrapped all the nodes in the cluster, I thought the 
>>>>>>> cluster was stable, but I have now got the case where the one of the 
>>>>>>> nodes is hanging again.
>>>>>>> 
>>>>>>> Does anyone have an ideas what is causing the problems ?
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks
>>>>>>> 
>>>>>>> Paul Chandler
>>>>>>> 
>>

Re: Cassandra 4.0 hanging on restart

Reply via email to