Hi Bowen, would it be possible to share a full thread dump?
Regards,
Dmitry
On Tue, 5 Nov 2024 at 12:12, Bowen Song via user
<user@cassandra.apache.org> wrote:
Hi all,
We have a cluster running Cassandra 4.1.1. We are seeing
the memtable flush randomly getting stuck. This has
happened twice in the last 10 days, to two different
nodes in the same cluster. This started to happen after
we enabled CDC, and each time it got stuck, there was at
least one repair running involving the affected node.
The signs of the stuck memtable flush is most obvious
from the "StatusLogger" logs.
At the beginning, the MemtablePostFlush and
MemtableFlushWriter got stuck, they have 1 and 2 active
tasks each, and a small number of pending tasks.
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:65 - Pool Name
Active Pending Completed Blocked All Time Blocked
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - ReadStage 0 0
34052333 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - CompactionExecutor 0
0 1019777 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MutationStage 0 0
14930764 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MemtableReclaimMemory 0
0 21877 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - PendingRangeCalculator 0
0 177 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - Repair#61 0 0
1344 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - GossipStage 0 0
889452 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - SecondaryIndexManagement 0
0 1 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - HintsDispatcher 0
0 19 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - Repair-Task 0
0 65 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - RequestResponseStage 0
0 44447834 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - Native-Transport-Requests
0 0 8967921 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MigrationStage 0
0 5 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MemtableFlushWriter 2
10 21781 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MemtablePostFlush 1
11 47856 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - PerDiskMemtableFlushWriter_0
0 0 21769 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - Sampler 0 0 0
0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - ValidationExecutor 0
0 36651 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - ViewBuildExecutor 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - InternalResponseStage 0
0 240 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 -
AntiEntropyStage 1 1773
120067 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - CacheCleanupExecutor 0
0 0 0 0
The number of pending tasks slowly grows larger over
time, and the number of completed tasks do not increase
at all.
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:65 - Pool Name
Active Pending Completed Blocked All Time Blocked
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - ReadStage 0 0
39905462 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - CompactionExecutor 0
0 1170100 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - MutationStage 0 0
16976992 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - Repair#76 0 0
0 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - Repair#74 0 0
0 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - Repair#72 0 0
0 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - GossipStage 0 0
1026877 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - Repair-Task 0
0 83 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - RequestResponseStage 0
0 51122191 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - ValidationExecutor 0
0 36651 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - MemtablePostFlush 1
54 47856 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - MemtableFlushWriter 2
53 21781 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - CacheCleanupExecutor 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - Repair#67 0 0
0 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - MemtableReclaimMemory 0
0 21877 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - PendingRangeCalculator 0
0 177 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - Repair#82 0 0
0 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - Repair#61 0 0
1344 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - SecondaryIndexManagement 0
0 1 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - HintsDispatcher 0
0 20 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - Native-Transport-Requests
0 0 10145020 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - MigrationStage 0
0 5 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - Sampler 0 0 0
0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - PerDiskMemtableFlushWriter_0
0 0 21769 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,942
StatusLogger.java:69 - ViewBuildExecutor 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,942
StatusLogger.java:69 - InternalResponseStage 0
0 996 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,942
StatusLogger.java:69 -
AntiEntropyStage 1 9694
120067 0 0
Then, things start to get worse, the MutationStage also
got stuck.
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:65 - Pool Name
Active Pending Completed Blocked All Time Blocked
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - ReadStage 0 0
40186426 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - CompactionExecutor 0
0 1171479 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - MutationStage
128 201414 16988212 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - Repair#76 0 0
0 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - Repair#74 0 0
0 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - Repair#72 0 0
0 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - GossipStage 0 0
1032988 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - Repair-Task 0
0 85 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - RequestResponseStage 0
0 51307497 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - ValidationExecutor 0
0 36651 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - MemtablePostFlush 1
54 47856 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - MemtableFlushWriter 2
53 21781 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - CacheCleanupExecutor 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - Repair#67 0 0
0 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - MemtableReclaimMemory 0
0 21877 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - Repair#84 0 0
0 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - PendingRangeCalculator 0
0 177 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - Repair#82 0 0
0 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - Repair#61 0 0
1344 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - SecondaryIndexManagement 0
0 1 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - HintsDispatcher 0
0 20 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - Native-Transport-Requests
0 0 10194549 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - MigrationStage 0
0 5 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - Sampler 0 0 0
0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - PerDiskMemtableFlushWriter_0
0 0 21769 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - ViewBuildExecutor 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - InternalResponseStage 0
0 1087 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 -
AntiEntropyStage 1 10364
120067 0 0
Writes on this node starts to timeout and fail. But if
left untouched, it's only gonna get worse, and eventually
lead to JVM OOM and crash.
By inspecting the heap dump created at OOM, we can see
that both of the MemtableFlushWriter threads are stuck on
line 1190
<https://github.com/apache/cassandra/blob/8d91b469afd3fcafef7ef85c10c8acc11703ba2d/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1190>
in the ColumnFamilyStore.java:
// mark writes older than the barrier as
blocking progress, permitting them to exceed our memory limit
// if they are stuck waiting on it, then wait
for them all to complete
writeBarrier.markBlocking();
writeBarrier.await(); // <----------- stuck
here
And the MemtablePostFlush thread is stuck on line 1094
<https://github.com/apache/cassandra/blob/8d91b469afd3fcafef7ef85c10c8acc11703ba2d/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1094>
in the same file.
try
{
// we wait on the latch for the
commitLogUpperBound to be set, and so that waiters
// on this task can rely on all prior
flushes being complete
latch.await(); // <----------- stuck here
}
Our top suspect is CDC interacting with repair, since
this started to happen shortly after we enabled CDC on
the nodes, and each time repair was running. But we have
not been able to reproduce this in a testing cluster, and
don't know what's the next step to troubleshoot this
issue. So I'm posting it in the mailing lists and hoping
someone may know something about it or point me to the
right direction.
p.s.: sorry about posting this to both the user & dev
mailing lists. It's an end-user related issue but
involves Cassandra internals, so I can't decide which one
is best suited.
Cheers,
Bowen
--
Dmitry Konstantinov