Re: Cassandra 4.0 hanging on restart

Bowen Song Wed, 26 Jan 2022 03:53:47 -0800

Yes, I understand that exposing JMX securely outside localhost involvesa fair amount of work. The sidecar option will use more RAM perCassandra node, which may or may not be an issue depending on theindividual circumstance. It is not a quick win if neither of those twooptions are optimal for you situation. However, based on my pastexperience with cron jobs and "nodetool repair", the wasted time onimplementing, maintaining, monitoring and occasionally troubleshootingand fixing it totally justifies the amount of one-off work on exposingthe JMX securely and setup Cassandra Reaper. Of course, mileage varies,you may find it easier and quicker to deal with nodetool cron jobs thanother options.


On 26/01/2022 11:00, Paul Chandler wrote:

We don’t expose the JMX port outside localhost, so last time we lookedit was not possible, I see now there is the sidecar option, but thatsounds like there are number of caveats, particularly aroundresources, that may cause some issues with our setup. So at the momentreaper does not seem like a quick win for us.
Thanks

Paul
On 26 Jan 2022, at 10:33, Bowen Song <bo...@bso.ng> wrote:
I'm glad that it fixed the problem. Now, may I interest you withCassandra Reaper <http://cassandra-reaper.io/>? In my experience ithas managed the load fairly well on large clusters.
On 26/01/2022 10:19, Paul Chandler wrote:
I changed the the range repair to be full repair, reset therepairedAt for all SSTables and deleted the old data out of thesystem.repairs table.
This then did not create any new rows in the system.repairs table,and the node was able to restart without any problem, so this seemsto be a solution
I am concerned about turning on full repair for some of our clusterand what that will do to the load, so I am now going to experimentwith larger incremental range repairs to see if there is a sweetspot where this works ok.
Paul
On 25 Jan 2022, at 13:36, Bowen Song <bo...@bso.ng> wrote:
That would indicate the "isSuperseded(session)" call returnedfalse. After looking at the source code, it seems the subrangeincremental repair is likely causing this.
Would you mind to try either subrange full repair or full rangeincremental repair? You may need to reset the "repairedAt" value inall SSTables using the "sstablerepairedset" tool if you decide tomove on to use subrange full repairs.
On 25/01/2022 12:39, Paul Chandler wrote:
Hi Bowen,
Yes there are a large number of "Skipping delete of FINALIZEDLocalSession” messages.
We have a script that repairs ranges, stepping through thecomplete range in 5 days, this should create 1600 ranges over the5 days, this runs commands like this:
nodetool -h localhost -p 7199 repair -pr -st+09152150533683801432 -et +09154639946886262655
I am also seeing lots of "Auto deleting repair sessionLocalSession” messages - these seem to be deleting the rows with arepairedAt value of more than 5 days, so it seems like that partis working correctly, but just taking 5 days to delete them.
Thanks

Paul
On 24 Jan 2022, at 22:12, Bowen Song <bo...@bso.ng> wrote:
From the source code I've read, by default Cassandra will run aclean up for the system.repairs table every 10 minutes, any rowrelated to a repair that has completed over 1 day ago will beautomatically removed. I highly doubt that you have ran 75,000repairs in the 24 hours prior to shutting down that node, becausethat's nearly one repair every second.
Do you see any logs like these?

  * Auto failing timed out repair session...
  * Skipping delete of FINALIZED LocalSession ... because it has
    not been superseded by a more recent session
  * Skipping delete of LocalSession ... because it still contains
    sstables
They are the logs from the cleanup() method inhttps://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L416which indicates a record was not deleted during the cleaned upfor a number of reasons.
On 24/01/2022 19:45, Paul Chandler wrote:
Hi Bowen,
Yes, there does seem to be a lot of rows, on one of the upgradedclusters there 75,000 rows.
I have been experimenting on a test cluster, this has about a 5minute pause, and around 15,000 rows.
If I clear the system.repairs table ( by deleting the sstables )then this does not pause at all, so seems to fix the problem.However I don’t really understand what the implications are ofjust removing that data.
Thanks

Paul
On 24 Jan 2022, at 18:50, Bowen Song <bo...@bso.ng> wrote:
Hmm, interesting... Try "select * from system.repairs;" incqlsh on a slow starting node, do you get a lots of rows? Thisis the most obvious loop run (indirectly) by theActiveRepairService.start().
On 24/01/2022 13:30, Romain Anselin wrote:
Hi everyone,

We generated a JFR profile of the startup phase of Cassandra with Paul, and it would appear that 
the time is spent in the ActiveRepairSession within the main thread (11mn of execution of the 
"main" thread in his environment, vs 15s in mine), which has been introduced in 
CASSANDRA-9143 based on a "view git blame" of the source code
https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/service/CassandraDaemon.java#L381

That seem to match with the gap we see in the logs, where the time is spent just before 
the "Preloaded" statement in the logs which comes just after in the 
CassandraDaemon code.

INFO  [main] 2022-01-20 09:44:40,198  StorageService.java:830 - Token metadata: 
Normal Tokens:
... <skipping non "main" thread logs> ...
WARN  [Messaging-EventLoop-3-1] 2022-01-20 09:45:13,243  NoSpamLogger.java:95 - 
/IP1:7000->/IP1:7000-SMALL_MESSAGES-[no-channel] dropping message of type 
SCHEMA_VERSION_REQ whose timeout expired before reaching the network
INFO  [main] 2022-01-20 09:55:01,134  QueryProcessor.java:150 - Preloaded 0 
prepared statements

Remains to determine what that is doing more in details and why it's taking 
longer and longer on startup.

We also exported the sstablemetadata from all sstables in one node, and at this stage, we 
can see we have 300 sstables out of 577 with "Repaired at:" set.
|cd /var/lib/cassandra/data find . -name '*Data*' | while readdatf; do echo ++++ $datf ++++; sudo -u cassandrasstablemetadata $datf; done >> ~/sstablemetadata.txt cqlsh -e"paging off; select * from system.repairs" >> ~/repairs.out $egrep 'Repaired at: 1' sstablemetadata.txt | wc -l 300 $ egrep'Repaired at:' sstablemetadata.txt | wc -l 577|
More info to come

Regards - Romain

On 19/01/2022 13:10, Paul Chandler wrote:
Hi Bowen,

Thanks for the reply, these have been our normal shutdowns, so we do a nodetool 
drain before restarting the service, so I would have thought there should not 
be any commtlogs

However there is these messages for one commit log, But looks like it has 
finished quickly and correctly:

INFO  [main] 2022-01-19 10:08:22,811  CommitLog.java:173 - Replaying 
/var/lib/cassandra/commitlog/CommitLog-7-1642094921295.log
WARN  [main] 2022-01-19 10:08:22,839  CommitLogReplayer.java:305 - Origin of 2 
sstables is unknown or doesn't match the local node; commitLogIntervals for 
them were ignored
Repeated about 10 times
WARN  [main] 2022-01-19 10:08:22,842  CommitLogReplayer.java:305 - Origin of 3 
sstables is unknown or doesn't match the local node; commitLogIntervals for 
them were ignored
INFO  [main] 2022-01-19 10:08:22,853  CommitLogReader.java:256 - Finished 
reading /var/lib/cassandra/commitlog/CommitLog-7-1642094921295.log
INFO  [main] 2022-01-19 10:08:22,882  CommitLog.java:175 - Log replay complete, 
0 replayed mutations

Thanks

Paul
On 19 Jan 2022, at 13:03, Bowen Song<bo...@bso.ng>  wrote:

Nothing obvious from the logs you posted.

Generally speaking, replaying commit log is often the culprit when a node takes a long 
time to start. I have seen many nodes with large memtable and commit log size limit 
spending over half an hour replaying the commit log. I usually do a "nodetool 
flush" before shutting down the node to help speed up the start time if the shutdown 
was planned. There isn't much you can do about unexpected shutdown, such as server 
crashes. When that happens, the only reasonable thing to do is wait for the commit log 
replay to finish. You should see log entries related to replaying commit logs if this is 
the case.

However, if you don't find any logs related to replaying commit logs, the cause 
may be completely different.


On 19/01/2022 11:54, Paul Chandler wrote:
Hi all,

We have upgraded a couple of clusters from 3.11.6, now we are having issues 
when we restart the nodes.

The node will either hang or take 10-30 minute to restart, these are the last 
messages we have in the system.log:

INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,267  FileUtils.java:545 - 
Deleting file during startup: 
/var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-184-big-Summary.db
INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,268  LogTransaction.java:240 - 
Unfinished transaction log, deleting 
/var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-185-big-Data.db
INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,268  FileUtils.java:545 - 
Deleting file during startup: 
/var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-185-big-Summary.db
INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,269  LogTransaction.java:240 - 
Unfinished transaction log, deleting 
/var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-186-big-Data.db
INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,270  FileUtils.java:545 - 
Deleting file during startup: 
/var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-186-big-Summary.db
INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,272  LogTransaction.java:240 - 
Unfinished transaction log, deleting 
/var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb_txn_unknowncompactiontype_bc501d00-790f-11ec-9f80-85
8854746758.log
INFO  [MemtableFlushWriter:2] 2022-01-19 10:08:23,289  LogTransaction.java:240 
- Unfinished transaction log, deleting 
/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb_txn_flush_bc52dc20-790f-11ec-9f80-858854746758.log

The debug log has messages from DiskBoundaryManager.java at the same time, then 
it just has the following messages:||

DEBUG [ScheduledTasks:1] 2022-01-19 10:28:09,430  SSLFactory.java:354 - 
Checking whether certificates have been updated []
DEBUG [ScheduledTasks:1] 2022-01-19 10:38:09,431  SSLFactory.java:354 - 
Checking whether certificates have been updated []
DEBUG [ScheduledTasks:1] 2022-01-19 10:48:09,431  SSLFactory.java:354 - 
Checking whether certificates have been updated []
DEBUG [ScheduledTasks:1] 2022-01-19 10:58:09,431  SSLFactory.java:354 - 
Checking whether certificates have been updated []


It seems to get worse after each restart, and then it gets to the state where 
it just hangs, then the only thing to do is to re bootstrap the node.

Once I had re bootstrapped all the nodes in the cluster, I thought the cluster 
was stable, but I have now got the case where the one of the nodes is hanging 
again.

Does anyone have an ideas what is causing the problems ?


Thanks

Paul Chandler

Re: Cassandra 4.0 hanging on restart

Reply via email to