Of cause, let me explain the situation. I have a common question
without direct relation the problem with “Unexplained stuck memtable
flush”. I would like to know, how can I identify situation that all
nodes cross all data centers will be synch.
* It is little tricky to wait e.g. 1 day, 2 days or longer time for
the synchronization
My question is. Do you see easy way, how can I check that expected
huge synchronization (e.g. after extension cluster – add new data
center or add half of new nodes) fill be finish from perspective of
data synch? Thx for sharing you best practices, regards
Jiri
*
*
*
This item's classification is Internal. It was created by and is in
property of the EmbedIT. Do not distribute outside of the organization.
From:* Bowen Song via user <user@cassandra.apache.org>
*Sent:* Tuesday, November 5, 2024 1:39 PM
*To:* user@cassandra.apache.org
*Cc:* Bowen Song <bo...@bso.ng>
*Subject:* Re: [External]Unexplained stuck memtable flush
This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly
with links and attachments.
Please report all suspicious e-mails to helpd...@embedit.com
------------------------------------------------------------------------
Hi Jiri,
Thank you for taking a look at this issue. But I'm sorry, I don't
really understand your message. Can you please elaborate?
Cheers,
Bowen
On 05/11/2024 12:34, Jiri Steuer (EIT) wrote:
Hi all,
It is possible easy to check the moment/milestone, when the data
cross more data centers will by synch (in case that other
applications and user access will be disabled)? I think about
monitoring of throughput or …? Thx for feedback
J. Steuer
This item's classification is Internal. It was created by and is
in property of the EmbedIT. Do not distribute outside of the
organization.
*From:* Bowen Song via user <user@cassandra.apache.org>
<mailto:user@cassandra.apache.org>
*Sent:* Tuesday, November 5, 2024 1:12 PM
*To:* d...@cassandra.apache.org; user@cassandra.apache.org
*Cc:* Bowen Song <bo...@bso.ng> <mailto:bo...@bso.ng>
*Subject:* [External]Unexplained stuck memtable flush
This message is from an EXTERNAL SENDER - be CAUTIOUS,
particularly with links and attachments.
Please report all suspicious e-mails to helpd...@embedit.com
------------------------------------------------------------------------
Hi all,
We have a cluster running Cassandra 4.1.1. We are seeing
the memtable flush randomly getting stuck. This has happened twice
in the last 10 days, to two different nodes in the same cluster.
This started to happen after we enabled CDC, and each time it got
stuck, there was at least one repair running involving the
affected node.
The signs of the stuck memtable flush is most obvious from the
"StatusLogger" logs.
At the beginning, the MemtablePostFlush and MemtableFlushWriter
got stuck, they have 1 and 2 active tasks each, and a small number
of pending tasks.
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:65 - Pool Name Active Pending Completed
Blocked All Time Blocked
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - ReadStage
0 0 34052333 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - CompactionExecutor 0
0 1019777 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MutationStage
0 0 14930764 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MemtableReclaimMemory 0
0 21877 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - PendingRangeCalculator 0
0 177 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - Repair#61 0
0 1344 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - GossipStage 0
0 889452 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - SecondaryIndexManagement 0
0 1 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - HintsDispatcher 0
0 19 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - Repair-Task 0
0 65 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - RequestResponseStage
0 0 44447834 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - Native-Transport-Requests 0
0 8967921 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MigrationStage 0
0 5 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MemtableFlushWriter 2
10 21781 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MemtablePostFlush 1
11 47856 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - PerDiskMemtableFlushWriter_0 0
0 21769 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - Sampler 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - ValidationExecutor 0
0 36651 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - ViewBuildExecutor 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - InternalResponseStage 0
0 240 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - AntiEntropyStage 1
1773 120067 0 0
INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - CacheCleanupExecutor 0
0 0 0 0
The number of pending tasks slowly grows larger over time, and the
number of completed tasks do not increase at all.
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:65 - Pool Name Active Pending Completed
Blocked All Time Blocked
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - ReadStage
0 0 39905462 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - CompactionExecutor 0
0 1170100 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - MutationStage
0 0 16976992 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - Repair#76 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - Repair#74 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - Repair#72 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - GossipStage 0
0 1026877 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - Repair-Task 0
0 83 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - RequestResponseStage
0 0 51122191 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - ValidationExecutor 0
0 36651 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - MemtablePostFlush 1
54 47856 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - MemtableFlushWriter 2
53 21781 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - CacheCleanupExecutor 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - Repair#67 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - MemtableReclaimMemory 0
0 21877 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - PendingRangeCalculator 0
0 177 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - Repair#82 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - Repair#61 0
0 1344 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - SecondaryIndexManagement 0
0 1 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - HintsDispatcher 0
0 20 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - Native-Transport-Requests
0 0 10145020 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - MigrationStage 0
0 5 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - Sampler 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,941
StatusLogger.java:69 - PerDiskMemtableFlushWriter_0 0
0 21769 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,942
StatusLogger.java:69 - ViewBuildExecutor 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,942
StatusLogger.java:69 - InternalResponseStage 0
0 996 0 0
INFO [ScheduledTasks:1] 2024-11-03 16:33:05,942
StatusLogger.java:69 - AntiEntropyStage 1
9694 120067 0 0
Then, things start to get worse, the MutationStage also got stuck.
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:65 - Pool Name Active Pending Completed
Blocked All Time Blocked
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - ReadStage
0 0 40186426 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - CompactionExecutor 0
0 1171479 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - MutationStage 128
201414 16988212 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - Repair#76 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - Repair#74 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - Repair#72 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - GossipStage 0
0 1032988 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - Repair-Task 0
0 85 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - RequestResponseStage
0 0 51307497 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - ValidationExecutor 0
0 36651 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - MemtablePostFlush 1
54 47856 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - MemtableFlushWriter 2
53 21781 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - CacheCleanupExecutor 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - Repair#67 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - MemtableReclaimMemory 0
0 21877 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - Repair#84 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - PendingRangeCalculator 0
0 177 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - Repair#82 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - Repair#61 0
0 1344 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - SecondaryIndexManagement 0
0 1 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - HintsDispatcher 0
0 20 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - Native-Transport-Requests
0 0 10194549 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - MigrationStage 0
0 5 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - Sampler 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - PerDiskMemtableFlushWriter_0 0
0 21769 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - ViewBuildExecutor 0
0 0 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - InternalResponseStage 0
0 1087 0 0
INFO [ScheduledTasks:1] 2024-11-03 17:05:40,974
StatusLogger.java:69 - AntiEntropyStage 1
10364 120067 0 0
Writes on this node starts to timeout and fail. But if left
untouched, it's only gonna get worse, and eventually lead to JVM
OOM and crash.
By inspecting the heap dump created at OOM, we can see that both
of the MemtableFlushWriter threads are stuck on line 1190
<https://github.com/apache/cassandra/blob/8d91b469afd3fcafef7ef85c10c8acc11703ba2d/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1190>
in the ColumnFamilyStore.java:
// mark writes older than the barrier as blocking progress,
permitting them to exceed our memory limit
// if they are stuck waiting on it, then wait for them
all to complete
writeBarrier.markBlocking();
writeBarrier.await(); // <----------- stuck here
And the MemtablePostFlush thread is stuck on line 1094
<https://github.com/apache/cassandra/blob/8d91b469afd3fcafef7ef85c10c8acc11703ba2d/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1094>
in the same file.
try
{
// we wait on the latch for the
commitLogUpperBound to be set, and so that waiters
// on this task can rely on all prior flushes
being complete
latch.await(); // <----------- stuck here
}
Our top suspect is CDC interacting with repair, since this started
to happen shortly after we enabled CDC on the nodes, and each time
repair was running. But we have not been able to reproduce this in
a testing cluster, and don't know what's the next step to
troubleshoot this issue. So I'm posting it in the mailing lists
and hoping someone may know something about it or point me to the
right direction.
p.s.: sorry about posting this to both the user & dev mailing
lists. It's an end-user related issue but involves Cassandra
internals, so I can't decide which one is best suited.
Cheers,
Bowen