Hi all, We have upgraded a couple of clusters from 3.11.6, now we are having issues when we restart the nodes.
The node will either hang or take 10-30 minute to restart, these are the last messages we have in the system.log: INFO [NonPeriodicTasks:1] 2022-01-19 10:08:23,267 FileUtils.java:545 - Deleting file during startup: /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-184-big-Summary.db INFO [NonPeriodicTasks:1] 2022-01-19 10:08:23,268 LogTransaction.java:240 - Unfinished transaction log, deleting /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-185-big-Data.db INFO [NonPeriodicTasks:1] 2022-01-19 10:08:23,268 FileUtils.java:545 - Deleting file during startup: /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-185-big-Summary.db INFO [NonPeriodicTasks:1] 2022-01-19 10:08:23,269 LogTransaction.java:240 - Unfinished transaction log, deleting /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-186-big-Data.db INFO [NonPeriodicTasks:1] 2022-01-19 10:08:23,270 FileUtils.java:545 - Deleting file during startup: /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-186-big-Summary.db INFO [NonPeriodicTasks:1] 2022-01-19 10:08:23,272 LogTransaction.java:240 - Unfinished transaction log, deleting /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb_txn_unknowncompactiontype_bc501d00-790f-11ec-9f80-85 8854746758.log INFO [MemtableFlushWriter:2] 2022-01-19 10:08:23,289 LogTransaction.java:240 - Unfinished transaction log, deleting /var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb_txn_flush_bc52dc20-790f-11ec-9f80-858854746758.log The debug log has messages from DiskBoundaryManager.java at the same time, then it just has the following messages:|| DEBUG [ScheduledTasks:1] 2022-01-19 10:28:09,430 SSLFactory.java:354 - Checking whether certificates have been updated [] DEBUG [ScheduledTasks:1] 2022-01-19 10:38:09,431 SSLFactory.java:354 - Checking whether certificates have been updated [] DEBUG [ScheduledTasks:1] 2022-01-19 10:48:09,431 SSLFactory.java:354 - Checking whether certificates have been updated [] DEBUG [ScheduledTasks:1] 2022-01-19 10:58:09,431 SSLFactory.java:354 - Checking whether certificates have been updated [] It seems to get worse after each restart, and then it gets to the state where it just hangs, then the only thing to do is to re bootstrap the node. Once I had re bootstrapped all the nodes in the cluster, I thought the cluster was stable, but I have now got the case where the one of the nodes is hanging again. Does anyone have an ideas what is causing the problems ? Thanks Paul Chandler