Re: [External]Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user

Hi Jiri,

Thank you for taking a look at this issue. But I'm sorry, I don't really 
understand your message. Can you please elaborate?


Cheers,
Bowen

On 05/11/2024 12:34, Jiri Steuer (EIT) wrote:


Hi all,

It is possible easy to check the moment/milestone, when the data cross 
more data centers will by synch (in case that other applications and 
user access will be disabled)? I think about monitoring of throughput 
or …? Thx for feedback


   J. Steuer

*
*

*

This item's classification is Internal. It was created by and is in 
property of the EmbedIT. Do not distribute outside of the organization.


From:* Bowen Song via user 
*Sent:* Tuesday, November 5, 2024 1:12 PM
*To:* d...@cassandra.apache.org; user@cassandra.apache.org
*Cc:* Bowen Song 
*Subject:* [External]Unexplained stuck memtable flush

This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly 
with links and attachments.

Please report all suspicious e-mails to helpd...@embedit.com



Hi all,

We have a cluster running Cassandra 4.1.1. We are seeing the memtable 
flush randomly getting stuck. This has happened twice in the last 10 
days, to two different nodes in the same cluster. This started to 
happen after we enabled CDC, and each time it got stuck, there was at 
least one repair running involving the affected node.


The signs of the stuck memtable flush is most obvious from the 
"StatusLogger" logs.


At the beginning, the MemtablePostFlush and MemtableFlushWriter got 
stuck, they have 1 and 2 active tasks each, and a small number of 
pending tasks.


INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:65 - 
Pool Name Active   Pending  Completed   Blocked  All Time Blocked
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 
- ReadStage    0 0 34052333 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 
- CompactionExecutor   0 0 1019777 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 
- MutationStage    0 0 14930764 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 
- MemtableReclaimMemory    0 0 21877 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 
- PendingRangeCalculator   0 0    177 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 
- Repair#61    0 0 1344 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 
- GossipStage  0 0 889452 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 
- SecondaryIndexManagement 0 0  1 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 
- HintsDispatcher  0 0 19 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 
- Repair-Task  0 0 65 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 
- RequestResponseStage 0 0 7834 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 
- Native-Transport-Requests    0 0 8967921 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 
- MigrationStage   0 0  5 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 
- MemtableFlushWriter  2    10 21781 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 
- MemtablePostFlush    1    11 47856 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 
- PerDiskMemtableFlushWriter_0 0 0 21769 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 
- Sampler  0 0  0 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 
- ValidationExecutor   0 0 36651 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 
- ViewBuildExecutor    0 0  0 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 
- InternalResponseStage    0 0    240 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.j

RE: [External]Unexplained stuck memtable flush

2024-11-05 Thread Jiri Steuer (EIT)
Of cause, let me explain the situation. I have a common question without direct 
relation the problem with “Unexplained stuck memtable flush”. I would like to 
know, how can I identify situation that all nodes cross all data centers will 
be synch.

  *   It is little tricky to wait e.g. 1 day, 2 days or longer time for the 
synchronization

My question is. Do you see easy way, how can I check that expected huge 
synchronization (e.g. after extension cluster – add new data center or add half 
of new nodes) fill be finish from perspective of data synch? Thx for sharing 
you best practices, regards

Jiri



This item's classification is Internal. It was created by and is in property of 
the EmbedIT. Do not distribute outside of the organization.

From: Bowen Song via user 
Sent: Tuesday, November 5, 2024 1:39 PM
To: user@cassandra.apache.org
Cc: Bowen Song 
Subject: Re: [External]Unexplained stuck memtable flush

This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links 
and attachments.
Please report all suspicious e-mails to 
helpd...@embedit.com



Hi Jiri,

Thank you for taking a look at this issue. But I'm sorry, I don't really 
understand your message. Can you please elaborate?

Cheers,
Bowen
On 05/11/2024 12:34, Jiri Steuer (EIT) wrote:
Hi all,

It is possible easy to check the moment/milestone, when the data cross more 
data centers will by synch (in case that other applications and user access 
will be disabled)? I think about monitoring of throughput or …? Thx for feedback

   J. Steuer



This item's classification is Internal. It was created by and is in property of 
the EmbedIT. Do not distribute outside of the organization.
From: Bowen Song via user 

Sent: Tuesday, November 5, 2024 1:12 PM
To: d...@cassandra.apache.org; 
user@cassandra.apache.org
Cc: Bowen Song 
Subject: [External]Unexplained stuck memtable flush

This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links 
and attachments.
Please report all suspicious e-mails to 
helpd...@embedit.com



Hi all,

We have a cluster running Cassandra 4.1.1. We are seeing the memtable flush 
randomly getting stuck. This has happened twice in the last 10 days, to two 
different nodes in the same cluster. This started to happen after we enabled 
CDC, and each time it got stuck, there was at least one repair running 
involving the affected node.

The signs of the stuck memtable flush is most obvious from the "StatusLogger" 
logs.

At the beginning, the MemtablePostFlush and MemtableFlushWriter got stuck, they 
have 1 and 2 active tasks each, and a small number of pending tasks.

INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:65 - Pool 
Name   Active   Pending  Completed   Blocked  All Time 
Blocked
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
ReadStage0 0   34052333 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
CompactionExecutor   0 01019777 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
MutationStage0 0   14930764 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
MemtableReclaimMemory0 0  21877 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
PendingRangeCalculator   0 0177 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
Repair#610 0   1344 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
GossipStage  0 0 889452 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
SecondaryIndexManagement 0 0  1 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
HintsDispatcher  0 0 19 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
Repair-Task  0 0 65 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
RequestResponseStage 0 0   7834 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
Native-Transport-Requests0 08967921

Re: [External]Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user
If it is not related to the memtable flush issue, can you please post in 
a different mailing list thread instead?


By replying to this thread, everyone reading it would initially assume 
it is somehow related, which neither good for them (wasting their time 
to try to understand it) nor you (your message may get ignored because 
it is off-topic).



On 05/11/2024 12:50, Jiri Steuer (EIT) wrote:


Of cause, let me explain the situation. I have a common question 
without direct relation the problem with “Unexplained stuck memtable 
flush”. I would like to know, how can I identify situation that all 
nodes cross all data centers will be synch.


  * It is little tricky to wait e.g. 1 day, 2 days or longer time for
the synchronization

My question is. Do you see easy way, how can I check that expected 
huge synchronization (e.g. after extension cluster – add new data 
center or add half of new nodes) fill be finish from perspective of 
data synch? Thx for sharing you best practices, regards


    Jiri

*
*

*

This item's classification is Internal. It was created by and is in 
property of the EmbedIT. Do not distribute outside of the organization.


From:* Bowen Song via user 
*Sent:* Tuesday, November 5, 2024 1:39 PM
*To:* user@cassandra.apache.org
*Cc:* Bowen Song 
*Subject:* Re: [External]Unexplained stuck memtable flush

This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly 
with links and attachments.

Please report all suspicious e-mails to helpd...@embedit.com



Hi Jiri,

Thank you for taking a look at this issue. But I'm sorry, I don't 
really understand your message. Can you please elaborate?


Cheers,
Bowen

On 05/11/2024 12:34, Jiri Steuer (EIT) wrote:

Hi all,

It is possible easy to check the moment/milestone, when the data
cross more data centers will by synch (in case that other
applications and user access will be disabled)? I think about
monitoring of throughput or …? Thx for feedback

   J. Steuer

This item's classification is Internal. It was created by and is
in property of the EmbedIT. Do not distribute outside of the
organization.

*From:* Bowen Song via user 

*Sent:* Tuesday, November 5, 2024 1:12 PM
*To:* d...@cassandra.apache.org; user@cassandra.apache.org
*Cc:* Bowen Song  
*Subject:* [External]Unexplained stuck memtable flush

This message is from an EXTERNAL SENDER - be CAUTIOUS,
particularly with links and attachments.
Please report all suspicious e-mails to helpd...@embedit.com



Hi all,

We have a cluster running Cassandra 4.1.1. We are seeing
the memtable flush randomly getting stuck. This has happened twice
in the last 10 days, to two different nodes in the same cluster.
This started to happen after we enabled CDC, and each time it got
stuck, there was at least one repair running involving the
affected node.

The signs of the stuck memtable flush is most obvious from the
"StatusLogger" logs.

At the beginning, the MemtablePostFlush and MemtableFlushWriter
got stuck, they have 1 and 2 active tasks each, and a small number
of pending tasks.

INFO [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:65 - Pool Name Active   Pending  Completed  
Blocked  All Time Blocked
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - ReadStage   
0 0 34052333 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - CompactionExecutor   0
0    1019777 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MutationStage   
0 0 14930764 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MemtableReclaimMemory    0
0  21877 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - PendingRangeCalculator   0
0    177 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - Repair#61    0
0   1344 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - GossipStage  0
0 889452 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - SecondaryIndexManagement 0
0  1 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - HintsDispatcher  0
0   

Re: Migration Cassandra to a new data center

2024-11-05 Thread edi mari
Each physical data center corresponds to a "logical" Cassandra DC (a group
of nodes).
In our situation, we need to move one of our physical data centers (i.e.,
the server rooms) to a new location, which will involve an extended period
of downtime.

Thanks
Edi

On Tue, Nov 5, 2024 at 1:27 PM Bowen Song via user <
user@cassandra.apache.org> wrote:

>  From the way you wrote this, I suspect the name DC may have different
> meaning here. Are you talking about the physical location (i.e server
> rooms), or the Cassandra DC (i.e. group of nodes for replication purposes)?
>
> On 05/11/2024 11:01, edi mari wrote:
> > Hello,
> > We have a Cassandra cluster deployed across three different data
> > centers, with each data center (DC1, DC2, and DC3) hosting 50
> > Cassandra nodes.
> >
> > We are currently saving one replica in each data center.
> > We plan to migrate DC3, including storage and servers, to a new data
> > center.
> >
> > 1. What would be the best method to perform this migration, which
> > could take several days (2–5 days)?
> > 2. Would relying on hints in the other two data centers be a good
> > approach?
> >
> > Thanks
> > Edi
>


Re: Migration Cassandra to a new data center

2024-11-05 Thread Bowen Song via user
From the way you wrote this, I suspect the name DC may have different 
meaning here. Are you talking about the physical location (i.e server 
rooms), or the Cassandra DC (i.e. group of nodes for replication purposes)?


On 05/11/2024 11:01, edi mari wrote:

Hello,
We have a Cassandra cluster deployed across three different data 
centers, with each data center (DC1, DC2, and DC3) hosting 50 
Cassandra nodes.


We are currently saving one replica in each data center.
We plan to migrate DC3, including storage and servers, to a new data 
center.


1. What would be the best method to perform this migration, which 
could take several days (2–5 days)?
2. Would relying on hints in the other two data centers be a good 
approach?


Thanks
Edi


Re: Migration Cassandra to a new data center

2024-11-05 Thread Bowen Song via user
You just confirmed my suspicion. You are indeed referring to both 
physical location of servers and the logical Cassandra DC with the same 
term here.


The questions are related to the procedure of migrating the server 
hardware to a new location, not the Cassandra DC.


Assuming that the IP addresses of servers don't change, and all data on 
all servers can be preserved, this process should be fairly simple and 
straight forward.


I believe the best approach here is:

1. ensure the application can handle the database servers go offline
   for extended period of time
   (e.g. they talk to the not migrating nodes, and the query CLs don't
   require nodes in the migrating DC to be available)
2. ensure the gc_grace_seconds of all tables is long enough to complete
   the migration and run repair twice
3. run and finish a full repair right before the migration begins
4. migrate the server hardware to the new location
5. run a full repair again to bring the data back in sync
6. restore the gc_grace_seconds back to appropriate value

As always, please ensure you have data backups, and make a contingency 
plan for potential issues in advance is also highly recommended.



On 05/11/2024 12:00, edi mari wrote:
Each physical data center corresponds to a "logical" Cassandra DC (a 
group of nodes).
In our situation, we need to move one of our physical data centers 
(i.e., the server rooms) to a new location, which will involve an 
extended period of downtime.


Thanks
Edi

On Tue, Nov 5, 2024 at 1:27 PM Bowen Song via user 
 wrote:


 From the way you wrote this, I suspect the name DC may have
different
meaning here. Are you talking about the physical location (i.e server
rooms), or the Cassandra DC (i.e. group of nodes for replication
purposes)?

On 05/11/2024 11:01, edi mari wrote:
> Hello,
> We have a Cassandra cluster deployed across three different data
> centers, with each data center (DC1, DC2, and DC3) hosting 50
> Cassandra nodes.
>
> We are currently saving one replica in each data center.
> We plan to migrate DC3, including storage and servers, to a new
data
> center.
>
> 1. What would be the best method to perform this migration, which
> could take several days (2–5 days)?
> 2. Would relying on hints in the other two data centers be a good
> approach?
>
> Thanks
> Edi


RE: [External]Unexplained stuck memtable flush

2024-11-05 Thread Jiri Steuer (EIT)
Hi all,

It is possible easy to check the moment/milestone, when the data cross more 
data centers will by synch (in case that other applications and user access 
will be disabled)? I think about monitoring of throughput or …? Thx for feedback

   J. Steuer



This item's classification is Internal. It was created by and is in property of 
the EmbedIT. Do not distribute outside of the organization.

From: Bowen Song via user 
Sent: Tuesday, November 5, 2024 1:12 PM
To: d...@cassandra.apache.org; user@cassandra.apache.org
Cc: Bowen Song 
Subject: [External]Unexplained stuck memtable flush

This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links 
and attachments.
Please report all suspicious e-mails to 
helpd...@embedit.com



Hi all,

We have a cluster running Cassandra 4.1.1. We are seeing the memtable flush 
randomly getting stuck. This has happened twice in the last 10 days, to two 
different nodes in the same cluster. This started to happen after we enabled 
CDC, and each time it got stuck, there was at least one repair running 
involving the affected node.

The signs of the stuck memtable flush is most obvious from the "StatusLogger" 
logs.

At the beginning, the MemtablePostFlush and MemtableFlushWriter got stuck, they 
have 1 and 2 active tasks each, and a small number of pending tasks.

INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:65 - Pool 
Name   Active   Pending  Completed   Blocked  All Time 
Blocked
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
ReadStage0 0   34052333 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
CompactionExecutor   0 01019777 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
MutationStage0 0   14930764 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
MemtableReclaimMemory0 0  21877 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
PendingRangeCalculator   0 0177 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
Repair#610 0   1344 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
GossipStage  0 0 889452 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
SecondaryIndexManagement 0 0  1 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
HintsDispatcher  0 0 19 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
Repair-Task  0 0 65 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
RequestResponseStage 0 0   7834 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
Native-Transport-Requests0 08967921 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
MigrationStage   0 0  5 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
MemtableFlushWriter  210  21781 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
MemtablePostFlush111  47856 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
PerDiskMemtableFlushWriter_0 0 0  21769 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - Sampler 
 0 0  0 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
ValidationExecutor   0 0  36651 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
ViewBuildExecutor0 0  0 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
InternalResponseStage0 0240 0   
  0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
AntiEntropyStage 1   

Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user

Hi all,

We have a cluster running Cassandra 4.1.1. We are seeing the memtable 
flush randomly getting stuck. This has happened twice in the last 10 
days, to two different nodes in the same cluster. This started to happen 
after we enabled CDC, and each time it got stuck, there was at least one 
repair running involving the affected node.


The signs of the stuck memtable flush is most obvious from the 
"StatusLogger" logs.


At the beginning, the MemtablePostFlush and MemtableFlushWriter got 
stuck, they have 1 and 2 active tasks each, and a small number of 
pending tasks.


INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:65 - 
Pool Name   Active   Pending  Completed Blocked  
All Time Blocked
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
ReadStage 0 0   34052333 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
CompactionExecutor 0 0    1019777 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
MutationStage 0 0   14930764 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
MemtableReclaimMemory 0 0  21877 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
PendingRangeCalculator 0 0    177 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
Repair#61 0 0   1344 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
GossipStage 0 0 889452 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
SecondaryIndexManagement 0 0  1 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
HintsDispatcher 0 0 19 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
Repair-Task 0 0 65 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
RequestResponseStage 0 0   7834 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
Native-Transport-Requests 0 0    8967921 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
MigrationStage 0 0  5 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
MemtableFlushWriter 2    10  21781 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
MemtablePostFlush 1    11  47856 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
PerDiskMemtableFlushWriter_0 0 0  21769 
0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
Sampler 0 0  0 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
ValidationExecutor 0 0  36651 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
ViewBuildExecutor 0 0  0 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
InternalResponseStage 0 0    240 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
AntiEntropyStage 1  1773 120067 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 - 
CacheCleanupExecutor 0 0  0 0 0


The number of pending tasks slowly grows larger over time, and the 
number of completed tasks do not increase at all.


INFO  [ScheduledTasks:1] 2024-11-03 16:33:05,941 StatusLogger.java:65 - 
Pool Name   Active   Pending  Completed Blocked  
All Time Blocked
INFO  [ScheduledTasks:1] 2024-11-03 16:33:05,941 StatusLogger.java:69 - 
ReadStage 0 0   39905462 0 0
INFO  [ScheduledTasks:1] 2024-11-03 16:33:05,941 StatusLogger.java:69 - 
CompactionExecutor 0 0    1170100 0 0
INFO  [ScheduledTasks:1] 2024-11-03 16:33:05,941 StatusLogger.java:69 - 
MutationStage 0 0   16976992 0 0
INFO  [ScheduledTasks:1] 2024-11-03 16:33:05,941 StatusLogger.java:69 - 
Repair#76 0 0  0 0 0
INFO  [ScheduledTasks:1] 2024-11-03 16:33:05,941 StatusLogger.java:69 - 
Repair#74 0 0  0 0 0
INFO  [ScheduledTasks:1] 2024-11-03 16:33:05

Re: Unexplained stuck memtable flush

2024-11-05 Thread Dmitry Konstantinov
Hi Bowen, would it be possible to share a full thread dump?

Regards,
Dmitry

On Tue, 5 Nov 2024 at 12:12, Bowen Song via user 
wrote:

> Hi all,
>
> We have a cluster running Cassandra 4.1.1. We are seeing the memtable
> flush randomly getting stuck. This has happened twice in the last 10 days,
> to two different nodes in the same cluster. This started to happen after we
> enabled CDC, and each time it got stuck, there was at least one repair
> running involving the affected node.
>
> The signs of the stuck memtable flush is most obvious from the
> "StatusLogger" logs.
>
> At the beginning, the MemtablePostFlush and MemtableFlushWriter got stuck,
> they have 1 and 2 active tasks each, and a small number of pending tasks.
>
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:65 -
> Pool Name   Active   Pending  Completed   Blocked
> All Time Blocked
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> ReadStage0 0   34052333
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> CompactionExecutor   0 01019777
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> MutationStage0 0   14930764
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> MemtableReclaimMemory0 0  21877
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> PendingRangeCalculator   0 0177
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> Repair#610 0   1344
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> GossipStage  0 0 889452
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> SecondaryIndexManagement 0 0  1
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> HintsDispatcher  0 0 19
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> Repair-Task  0 0 65
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> RequestResponseStage 0 0   7834
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> Native-Transport-Requests0 08967921
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> MigrationStage   0 0  5
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> MemtableFlushWriter  210  21781
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> MemtablePostFlush111  47856
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> PerDiskMemtableFlushWriter_0 0 0  21769
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> Sampler  0 0  0
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> ValidationExecutor   0 0  36651
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> ViewBuildExecutor0 0  0
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> InternalResponseStage0 0240
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> AntiEntropyStage 1  1773 120067
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
> CacheCleanupExecutor 0 0  0
> 0 0
>
> The number of pending tasks slowly grows larger over time, and the number
> of completed tasks do not increase at all.
>
> INFO  [ScheduledTasks:1] 2024-11-03 16:33:05,941 StatusLogger.java:65 -
> Pool Name   Active   Pending  Completed   Blocked
> All Time Blocked
> INFO  [ScheduledTasks:1] 2024-11-03 16:33:05,941 StatusLogger.java:69 -
> ReadStage0 0   39905462
> 0 0
> INFO  [ScheduledTasks:1] 2024-11-03 16:33:05,941 StatusLogger.java:69 -
> 

Re: Migration Cassandra to a new data center

2024-11-05 Thread edi mari
Thank you for your reply, Bowen.
Correct, the questions were about migrating the server hardware to a new
location, not the Cassandra DC.

Wouldn’t it be a good idea to use the hints to complete the data to DC3?
I'll extend the hint window (e.g., to one week) and allow the other data
centers (DC1 and DC2) to save hints for DC3.
Then, when DC3 returns online, it can receive and process the hints.

Edi

On Tue, Nov 5, 2024 at 2:34 PM Bowen Song via user <
user@cassandra.apache.org> wrote:

> You just confirmed my suspicion. You are indeed referring to both physical
> location of servers and the logical Cassandra DC with the same term here.
>
> The questions are related to the procedure of migrating the server
> hardware to a new location, not the Cassandra DC.
>
> Assuming that the IP addresses of servers don't change, and all data on
> all servers can be preserved, this process should be fairly simple and
> straight forward.
>
> I believe the best approach here is:
>
>1. ensure the application can handle the database servers go offline
>for extended period of time
>(e.g. they talk to the not migrating nodes, and the query CLs don't
>require nodes in the migrating DC to be available)
>2. ensure the gc_grace_seconds of all tables is long enough to
>complete the migration and run repair twice
>3. run and finish a full repair right before the migration begins
>4. migrate the server hardware to the new location
>5. run a full repair again to bring the data back in sync
>6. restore the gc_grace_seconds back to appropriate value
>
> As always, please ensure you have data backups, and make a contingency
> plan for potential issues in advance is also highly recommended.
>
>
> On 05/11/2024 12:00, edi mari wrote:
>
> Each physical data center corresponds to a "logical" Cassandra DC (a group
> of nodes).
> In our situation, we need to move one of our physical data centers (i.e.,
> the server rooms) to a new location, which will involve an extended period
> of downtime.
>
> Thanks
> Edi
>
> On Tue, Nov 5, 2024 at 1:27 PM Bowen Song via user <
> user@cassandra.apache.org> wrote:
>
>>  From the way you wrote this, I suspect the name DC may have different
>> meaning here. Are you talking about the physical location (i.e server
>> rooms), or the Cassandra DC (i.e. group of nodes for replication
>> purposes)?
>>
>> On 05/11/2024 11:01, edi mari wrote:
>> > Hello,
>> > We have a Cassandra cluster deployed across three different data
>> > centers, with each data center (DC1, DC2, and DC3) hosting 50
>> > Cassandra nodes.
>> >
>> > We are currently saving one replica in each data center.
>> > We plan to migrate DC3, including storage and servers, to a new data
>> > center.
>> >
>> > 1. What would be the best method to perform this migration, which
>> > could take several days (2–5 days)?
>> > 2. Would relying on hints in the other two data centers be a good
>> > approach?
>> >
>> > Thanks
>> > Edi
>>
>


Re: Unexplained stuck memtable flush

2024-11-05 Thread Dmitry Konstantinov
I am speaking about a thread dump (stack traces for all threads), not a
heap dump. The heap dump should contain thread stacks info.
Thread dump (stack traces) is small and does not have sensitive info.

Regards,
Dmitry

On Tue, 5 Nov 2024 at 13:53, Bowen Song via user 
wrote:

> It's about 18GB in size and may contain a huge amount of sensitive data
> (e.g. all the pending writes), so I can't share it. However, if there's any
> particular piece of information you would like to have, I'm more than happy
> to extract the info from the dump and and share it here.
> On 05/11/2024 13:01, Dmitry Konstantinov wrote:
>
> Hi Bowen, would it be possible to share a full thread dump?
>
> Regards,
> Dmitry
>
> On Tue, 5 Nov 2024 at 12:12, Bowen Song via user <
> user@cassandra.apache.org> wrote:
>
>> Hi all,
>>
>> We have a cluster running Cassandra 4.1.1. We are seeing the memtable
>> flush randomly getting stuck. This has happened twice in the last 10 days,
>> to two different nodes in the same cluster. This started to happen after we
>> enabled CDC, and each time it got stuck, there was at least one repair
>> running involving the affected node.
>>
>> The signs of the stuck memtable flush is most obvious from the
>> "StatusLogger" logs.
>>
>> At the beginning, the MemtablePostFlush and MemtableFlushWriter got
>> stuck, they have 1 and 2 active tasks each, and a small number of pending
>> tasks.
>>
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:65 -
>> Pool Name   Active   Pending  Completed   Blocked
>> All Time Blocked
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> ReadStage0 0   34052333
>> 0 0
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> CompactionExecutor   0 01019777
>> 0 0
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> MutationStage0 0   14930764
>> 0 0
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> MemtableReclaimMemory0 0  21877
>> 0 0
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> PendingRangeCalculator   0 0177
>> 0 0
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> Repair#610 0   1344
>> 0 0
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> GossipStage  0 0 889452
>> 0 0
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> SecondaryIndexManagement 0 0  1
>> 0 0
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> HintsDispatcher  0 0 19
>> 0 0
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> Repair-Task  0 0 65
>> 0 0
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> RequestResponseStage 0 0   7834
>> 0 0
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> Native-Transport-Requests0 08967921
>> 0 0
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> MigrationStage   0 0  5
>> 0 0
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> MemtableFlushWriter  210  21781
>> 0 0
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> MemtablePostFlush111  47856
>> 0 0
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> PerDiskMemtableFlushWriter_0 0 0  21769
>> 0 0
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> Sampler  0 0  0
>> 0 0
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> ValidationExecutor   0 0  36651
>> 0 0
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> ViewBuildExecutor0 0  0
>> 0 0
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> InternalResponseStage0 0240
>> 0 0
>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>> AntiEntropyStage 1

Re: Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user
It's about 18GB in size and may contain a huge amount of sensitive data 
(e.g. all the pending writes), so I can't share it. However, if there's 
any particular piece of information you would like to have, I'm more 
than happy to extract the info from the dump and and share it here.


On 05/11/2024 13:01, Dmitry Konstantinov wrote:

Hi Bowen, would it be possible to share a full thread dump?

Regards,
Dmitry

On Tue, 5 Nov 2024 at 12:12, Bowen Song via user 
 wrote:


Hi all,

We have a cluster running Cassandra 4.1.1. We are seeing
the memtable flush randomly getting stuck. This has happened twice
in the last 10 days, to two different nodes in the same cluster.
This started to happen after we enabled CDC, and each time it got
stuck, there was at least one repair running involving the
affected node.

The signs of the stuck memtable flush is most obvious from the
"StatusLogger" logs.

At the beginning, the MemtablePostFlush and MemtableFlushWriter
got stuck, they have 1 and 2 active tasks each, and a small number
of pending tasks.

INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:65 - Pool Name   Active  
Pending Completed   Blocked  All Time Blocked
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - ReadStage   
0 0 34052333 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - CompactionExecutor  
0 0 1019777 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MutationStage   
0 0 14930764 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MemtableReclaimMemory    0
0  21877 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - PendingRangeCalculator   0
0    177 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - Repair#61    0
0   1344 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - GossipStage 
0 0 889452 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - SecondaryIndexManagement 0
0  1 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - HintsDispatcher  0
0 19 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - Repair-Task  0
0 65 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - RequestResponseStage
0 0 7834 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - Native-Transport-Requests   
0 0 8967921 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MigrationStage   0
0  5 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MemtableFlushWriter  2
10  21781 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MemtablePostFlush    1
11  47856 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - PerDiskMemtableFlushWriter_0 0
0  21769 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - Sampler  0
0  0 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - ValidationExecutor   0
0  36651 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - ViewBuildExecutor    0
0  0 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - InternalResponseStage    0
0    240 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - AntiEntropyStage 1 
1773 120067 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - CacheCleanupExecutor 0
0  0 0 0

The number of pending 

Re: Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user
Sorry, I must have misread it. The full thread dump is attached. I 
compressed it with gzip because the text file is over 1 MB in size.



On 05/11/2024 14:04, Dmitry Konstantinov wrote:
I am speaking about a thread dump (stack traces for all threads), not 
a heap dump. The heap dump should contain thread stacks info.

Thread dump (stack traces) is small and does not have sensitive info.

Regards,
Dmitry

On Tue, 5 Nov 2024 at 13:53, Bowen Song via user 
 wrote:


It's about 18GB in size and may contain a huge amount of sensitive
data (e.g. all the pending writes), so I can't share it. However,
if there's any particular piece of information you would like to
have, I'm more than happy to extract the info from the dump and
and share it here.

On 05/11/2024 13:01, Dmitry Konstantinov wrote:

Hi Bowen, would it be possible to share a full thread dump?

Regards,
Dmitry

On Tue, 5 Nov 2024 at 12:12, Bowen Song via user
 wrote:

Hi all,

We have a cluster running Cassandra 4.1.1. We are seeing
the memtable flush randomly getting stuck. This has happened
twice in the last 10 days, to two different nodes in the same
cluster. This started to happen after we enabled CDC, and
each time it got stuck, there was at least one repair running
involving the affected node.

The signs of the stuck memtable flush is most obvious from
the "StatusLogger" logs.

At the beginning, the MemtablePostFlush and
MemtableFlushWriter got stuck, they have 1 and 2 active tasks
each, and a small number of pending tasks.

INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:65 - Pool Name   Active
Pending  Completed   Blocked  All Time Blocked
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - ReadStage    0
0   34052333 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - CompactionExecutor   0
0    1019777 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MutationStage    0
0   14930764 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MemtableReclaimMemory    0
0  21877 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - PendingRangeCalculator   0
0    177 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - Repair#61    0
0   1344 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - GossipStage  0
0 889452 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - SecondaryIndexManagement 0
0  1 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - HintsDispatcher  0
0 19 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - Repair-Task  0
0 65 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - RequestResponseStage 0
0   7834 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - Native-Transport-Requests    0
0    8967921 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MigrationStage   0
0  5 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MemtableFlushWriter  2
10  21781 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - MemtablePostFlush    1
11  47856 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - PerDiskMemtableFlushWriter_0 0
0  21769 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - Sampler  0
0  0 0 0
INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224
StatusLogger.java:69 - ValidationExe

Re: Migration Cassandra to a new data center

2024-11-05 Thread Bowen Song via user
Hinted hand off is a best effort approach, and relying on it alone is a 
bad idea. Hints can get lost due to a number of reasons, such as getting 
too old or too big, or the node storing the hints dies. You should rely 
on regular repair to guarantee the correctness of the data. You may use 
hinted hand off as a supplement, but not a replacement.


On 05/11/2024 15:53, edi mari wrote:

Thank you for your reply, Bowen.
Correct, the questions were about migrating the server hardware to a 
new location, not the Cassandra DC.


Wouldn’t it be a good idea to use the hints to complete the data to DC3?
I'll extend the hint window (e.g., to one week) and allow the other 
data centers (DC1 and DC2) to save hints for DC3.

Then, when DC3 returns online, it can receive and process the hints.

Edi

On Tue, Nov 5, 2024 at 2:34 PM Bowen Song via user 
 wrote:


You just confirmed my suspicion. You are indeed referring to both
physical location of servers and the logical Cassandra DC with the
same term here.

The questions are related to the procedure of migrating the server
hardware to a new location, not the Cassandra DC.

Assuming that the IP addresses of servers don't change, and all
data on all servers can be preserved, this process should be
fairly simple and straight forward.

I believe the best approach here is:

 1. ensure the application can handle the database servers go
offline for extended period of time
(e.g. they talk to the not migrating nodes, and the query CLs
don't require nodes in the migrating DC to be available)
 2. ensure the gc_grace_seconds of all tables is long enough to
complete the migration and run repair twice
 3. run and finish a full repair right before the migration begins
 4. migrate the server hardware to the new location
 5. run a full repair again to bring the data back in sync
 6. restore the gc_grace_seconds back to appropriate value

As always, please ensure you have data backups, and make a
contingency plan for potential issues in advance is also highly
recommended.


On 05/11/2024 12:00, edi mari wrote:

Each physical data center corresponds to a "logical" Cassandra DC
(a group of nodes).
In our situation, we need to move one of our physical data
centers (i.e., the server rooms) to a new location, which will
involve an extended period of downtime.

Thanks
Edi

On Tue, Nov 5, 2024 at 1:27 PM Bowen Song via user
 wrote:

 From the way you wrote this, I suspect the name DC may have
different
meaning here. Are you talking about the physical location
(i.e server
rooms), or the Cassandra DC (i.e. group of nodes for
replication purposes)?

On 05/11/2024 11:01, edi mari wrote:
> Hello,
> We have a Cassandra cluster deployed across three different
data
> centers, with each data center (DC1, DC2, and DC3) hosting 50
> Cassandra nodes.
>
> We are currently saving one replica in each data center.
> We plan to migrate DC3, including storage and servers, to a
new data
> center.
>
> 1. What would be the best method to perform this migration,
which
> could take several days (2–5 days)?
> 2. Would relying on hints in the other two data centers be
a good
> approach?
>
> Thanks
> Edi


Re: Unexplained stuck memtable flush

2024-11-05 Thread Jon Haddad
I ran into this a few months ago, and in my case I tracked it down to an
issue with ZFS not unlinking commitlogs properly.

https://issues.apache.org/jira/browse/CASSANDRA-19564

On Tue, Nov 5, 2024 at 6:05 AM Dmitry Konstantinov 
wrote:

> I am speaking about a thread dump (stack traces for all threads), not a
> heap dump. The heap dump should contain thread stacks info.
> Thread dump (stack traces) is small and does not have sensitive info.
>
> Regards,
> Dmitry
>
> On Tue, 5 Nov 2024 at 13:53, Bowen Song via user <
> user@cassandra.apache.org> wrote:
>
>> It's about 18GB in size and may contain a huge amount of sensitive data
>> (e.g. all the pending writes), so I can't share it. However, if there's any
>> particular piece of information you would like to have, I'm more than happy
>> to extract the info from the dump and and share it here.
>> On 05/11/2024 13:01, Dmitry Konstantinov wrote:
>>
>> Hi Bowen, would it be possible to share a full thread dump?
>>
>> Regards,
>> Dmitry
>>
>> On Tue, 5 Nov 2024 at 12:12, Bowen Song via user <
>> user@cassandra.apache.org> wrote:
>>
>>> Hi all,
>>>
>>> We have a cluster running Cassandra 4.1.1. We are seeing the memtable
>>> flush randomly getting stuck. This has happened twice in the last 10 days,
>>> to two different nodes in the same cluster. This started to happen after we
>>> enabled CDC, and each time it got stuck, there was at least one repair
>>> running involving the affected node.
>>>
>>> The signs of the stuck memtable flush is most obvious from the
>>> "StatusLogger" logs.
>>>
>>> At the beginning, the MemtablePostFlush and MemtableFlushWriter got
>>> stuck, they have 1 and 2 active tasks each, and a small number of pending
>>> tasks.
>>>
>>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:65 -
>>> Pool Name   Active   Pending  Completed   Blocked
>>> All Time Blocked
>>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>>> ReadStage0 0   34052333
>>> 0 0
>>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>>> CompactionExecutor   0 01019777
>>> 0 0
>>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>>> MutationStage0 0   14930764
>>> 0 0
>>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>>> MemtableReclaimMemory0 0  21877
>>> 0 0
>>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>>> PendingRangeCalculator   0 0177
>>> 0 0
>>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>>> Repair#610 0   1344
>>> 0 0
>>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>>> GossipStage  0 0 889452
>>> 0 0
>>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>>> SecondaryIndexManagement 0 0  1
>>> 0 0
>>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>>> HintsDispatcher  0 0 19
>>> 0 0
>>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>>> Repair-Task  0 0 65
>>> 0 0
>>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>>> RequestResponseStage 0 0   7834
>>> 0 0
>>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>>> Native-Transport-Requests0 08967921
>>> 0 0
>>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>>> MigrationStage   0 0  5
>>> 0 0
>>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>>> MemtableFlushWriter  210  21781
>>> 0 0
>>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>>> MemtablePostFlush111  47856
>>> 0 0
>>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>>> PerDiskMemtableFlushWriter_0 0 0  21769
>>> 0 0
>>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>>> Sampler  0 0  0
>>> 0 0
>>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -
>>> ValidationExecutor   0 0  36651
>>> 0 0
>>> INFO  [ScheduledTasks:1] 2024-11-03 03:50:15,224 StatusLogger.java:69 -

Re: Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user

Hi Jon,

That is interesting. We happen to be running Cassandra on ZFS. However 
we have not had any incident for years with this setup, the only change 
is the recent addition of CDC.


I can see that in CASSANDRA-19564, the MemtablePostFlush thread was 
stuck on the unlink() syscall. But in our case, it was stuck here:


"MemtablePostFlush:1" daemon prio=5 tid=237 WAITING
    at jdk.internal.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:323)
    at 
org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:289)
   local variable: 
org.apache.cassandra.utils.concurrent.WaitQueue$Standard$RegisteredSignal#1
    at 
org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:282)
    at 
org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:306)
   local variable: 
org.apache.cassandra.utils.concurrent.CountDownLatch$Async#7
    at 
org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:338)
    at 
org.apache.cassandra.db.ColumnFamilyStore$PostFlush.call(ColumnFamilyStore.java:1094)
   local variable: 
org.apache.cassandra.db.ColumnFamilyStore$PostFlush#7
    at 
org.apache.cassandra.db.ColumnFamilyStore$PostFlush.call(ColumnFamilyStore.java:1077)

    at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47)
    at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57)
   local variable: org.apache.cassandra.concurrent.FutureTask#2555
   local variable: org.apache.cassandra.concurrent.FutureTask#2555
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
   local variable: 
org.apache.cassandra.concurrent.SingleThreadExecutorPlus#5

   local variable: java.util.concurrent.ThreadPoolExecutor$Worker#9
   local variable: io.netty.util.concurrent.FastThreadLocalThread#11
   local variable: org.apache.cassandra.concurrent.FutureTask#2555
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)

   local variable: java.util.concurrent.ThreadPoolExecutor$Worker#9
    at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

   local variable: io.netty.util.concurrent.FastThreadLocalRunnable#11
    at java.lang.Thread.run(Thread.java:829)
   local variable: io.netty.util.concurrent.FastThreadLocalThread#11

Which made me believe that they are not the same issue.

Thanks for the info though.

BTW, if you still have access to that oddly behaving ZFS filesystem, 
have a look at the free space (should have at least 20% free) and the 
fragmentation ratio in "zpool list". If neither is an issue, check 
whether deduplication is enabled or there's a large number of snapshots. 
These can have significant impact on file deletion performance on ZFS. 
Also worth checking the disks, I have seen broken disks that stuck on 
some operations, e.g. when a specific sector is being read, and this 
will certainly affect the filesystem behaviour.


Cheers,
Bowen

On 05/11/2024 16:41, Jon Haddad wrote:
I ran into this a few months ago, and in my case I tracked it down to 
an issue with ZFS not unlinking commitlogs properly.


https://issues.apache.org/jira/browse/CASSANDRA-19564

On Tue, Nov 5, 2024 at 6:05 AM Dmitry Konstantinov 
 wrote:


I am speaking about a thread dump (stack traces for all threads),
not a heap dump. The heap dump should contain thread stacks info.
Thread dump (stack traces) is small and does not have sensitive info.

Regards,
Dmitry

On Tue, 5 Nov 2024 at 13:53, Bowen Song via user
 wrote:

It's about 18GB in size and may contain a huge amount of
sensitive data (e.g. all the pending writes), so I can't share
it. However, if there's any particular piece of information
you would like to have, I'm more than happy to extract the
info from the dump and and share it here.

On 05/11/2024 13:01, Dmitry Konstantinov wrote:

Hi Bowen, would it be possible to share a full thread dump?

Regards,
Dmitry

On Tue, 5 Nov 2024 at 12:12, Bowen Song via user
 wrote:

Hi all,

We have a cluster running Cassandra 4.1.1. We are seeing
the memtable flush randomly getting stuck. This has
happened twice in the last 10 days, to two different
nodes in the same cluster. This started to happen after
we enabled CDC, and each time it got stuck, there was at
least one repair running involving the affected node.

The signs of the stuck memtable flush is most obvious
from the "StatusLogger" logs.

At the beginning, the MemtablePostFlush and
MemtableFlushWriter got stuck, they have 1 and 2 active
tasks each, and a small number of pending tasks.

 

Re: Unexplained stuck memtable flush

2024-11-05 Thread Jon Haddad
Yeah, I looked through your stack trace and saw it wasn't the same thing,
but the steps to identify the root cause should be the same.

I nuked ZFS from orbit :)   This was happening across all the machines at
various times in the cluster, and we haven't seen a single issue since
switching to XFS.

Thanks for the advice though, I'll keep it in mind if I encounter it again.

Jon

On Tue, Nov 5, 2024 at 9:18 AM Bowen Song via user <
user@cassandra.apache.org> wrote:

> Hi Jon,
>
> That is interesting. We happen to be running Cassandra on ZFS. However we
> have not had any incident for years with this setup, the only change is the
> recent addition of CDC.
>
> I can see that in CASSANDRA-19564, the MemtablePostFlush thread was stuck
> on the unlink() syscall. But in our case, it was stuck here:
>
> "MemtablePostFlush:1" daemon prio=5 tid=237 WAITING
> at jdk.internal.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:323)
> at
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:289)
>local variable:
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$RegisteredSignal#1
> at
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:282)
> at
> org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:306)
>local variable:
> org.apache.cassandra.utils.concurrent.CountDownLatch$Async#7
> at
> org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:338)
> at
> org.apache.cassandra.db.ColumnFamilyStore$PostFlush.call(ColumnFamilyStore.java:1094)
>local variable:
> org.apache.cassandra.db.ColumnFamilyStore$PostFlush#7
> at
> org.apache.cassandra.db.ColumnFamilyStore$PostFlush.call(ColumnFamilyStore.java:1077)
> at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47)
> at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57)
>local variable: org.apache.cassandra.concurrent.FutureTask#2555
>local variable: org.apache.cassandra.concurrent.FutureTask#2555
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>local variable:
> org.apache.cassandra.concurrent.SingleThreadExecutorPlus#5
>local variable: java.util.concurrent.ThreadPoolExecutor$Worker#9
>local variable: io.netty.util.concurrent.FastThreadLocalThread#11
>local variable: org.apache.cassandra.concurrent.FutureTask#2555
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>local variable: java.util.concurrent.ThreadPoolExecutor$Worker#9
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>local variable: io.netty.util.concurrent.FastThreadLocalRunnable#11
> at java.lang.Thread.run(Thread.java:829)
>local variable: io.netty.util.concurrent.FastThreadLocalThread#11
>
> Which made me believe that they are not the same issue.
>
> Thanks for the info though.
>
> BTW, if you still have access to that oddly behaving ZFS filesystem, have
> a look at the free space (should have at least 20% free) and the
> fragmentation ratio in "zpool list". If neither is an issue, check whether
> deduplication is enabled or there's a large number of snapshots. These can
> have significant impact on file deletion performance on ZFS. Also worth
> checking the disks, I have seen broken disks that stuck on some operations,
> e.g. when a specific sector is being read, and this will certainly affect
> the filesystem behaviour.
>
> Cheers,
> Bowen
> On 05/11/2024 16:41, Jon Haddad wrote:
>
> I ran into this a few months ago, and in my case I tracked it down to an
> issue with ZFS not unlinking commitlogs properly.
>
> https://issues.apache.org/jira/browse/CASSANDRA-19564
>
> On Tue, Nov 5, 2024 at 6:05 AM Dmitry Konstantinov 
> wrote:
>
>> I am speaking about a thread dump (stack traces for all threads), not a
>> heap dump. The heap dump should contain thread stacks info.
>> Thread dump (stack traces) is small and does not have sensitive info.
>>
>> Regards,
>> Dmitry
>>
>> On Tue, 5 Nov 2024 at 13:53, Bowen Song via user <
>> user@cassandra.apache.org> wrote:
>>
>>> It's about 18GB in size and may contain a huge amount of sensitive data
>>> (e.g. all the pending writes), so I can't share it. However, if there's any
>>> particular piece of information you would like to have, I'm more than happy
>>> to extract the info from the dump and and share it here.
>>> On 05/11/2024 13:01, Dmitry Konstantinov wrote:
>>>
>>> Hi Bowen, would it be possible to share a full thread dump?
>>>
>>> Regards,
>>> Dmitry
>>>
>>> On Tue, 5 Nov 2024 at 12:12, Bowen Song via user <
>>> user@cassandra.apache.org> wrote:
>>>
 Hi all,

 We have a cluster running Cassandra 4.1.1. We are seeing the memtable
 flush randomly getting stuck.

Re: Unexplained stuck memtable flush

2024-11-05 Thread Jeff Jirsa


> On Nov 5, 2024, at 4:12 AM, Bowen Song via user  
> wrote:
> 
> Writes on this node starts to timeout and fail. But if left untouched, it's 
> only gonna get worse, and eventually lead to JVM OOM and crash.
> 
> By inspecting the heap dump created at OOM, we can see that both of the 
> MemtableFlushWriter threads are stuck on line 1190 
> 
>  in the ColumnFamilyStore.java:
> 
> // mark writes older than the barrier as blocking progress, 
> permitting them to exceed our memory limit
> // if they are stuck waiting on it, then wait for them all to 
> complete
> writeBarrier.markBlocking();
> writeBarrier.await();   // <--- stuck here
> 
> And the MemtablePostFlush thread is stuck on line 1094 
> 
>  in the same file.
> 
> try
> {
> // we wait on the latch for the commitLogUpperBound to be 
> set, and so that waiters
> // on this task can rely on all prior flushes being complete
> latch.await();   // <--- stuck here
> }
> Our top suspect is CDC interacting with repair, since this started to happen 
> shortly after we enabled CDC on the nodes, and each time repair was running. 
> But we have not been able to reproduce this in a testing cluster, and don't 
> know what's the next step to troubleshoot this issue. So I'm posting it in 
> the mailing lists and hoping someone may know something about it or point me 
> to the right direction.
> 

Wouldn’t be completely surprised if  CDC  or repair somehow has a barrier, I’ve 
also seen similar behavior pre-3.0 with “very long running read commands” that 
have a barrier on the memtable that prevent release.

You’ve got the heap (great, way better than most people debugging), are you 
able to navigate through it and look for references to that memtable or other 
things holding a barrier?






Migration Cassandra to a new data center

2024-11-05 Thread edi mari
Hello,
We have a Cassandra cluster deployed across three different data centers,
with each data center (DC1, DC2, and DC3) hosting 50 Cassandra nodes.

We are currently saving one replica in each data center.
We plan to migrate DC3, including storage and servers, to a new data
center.

1. What would be the best method to perform this migration, which could
take several days (2–5 days)?
2. Would relying on hints in the other two data centers be a good approach?

Thanks
Edi


Re: Upgrade from 4 to 5 issue

2024-11-05 Thread Joe Obernberger
Found issue - num tokens was set incorrectly in my container. Upgrade 
successful!


-Joe

On 11/5/2024 2:27 PM, Joe Obernberger wrote:
Hi all - getting an error trying to upgrade our 4.x cluster to 5.  The 
following message repeats over and over and then the pod crashes:


Heap dump creation on uncaught exceptions is disabled.




DEBUG [MemtableFlushWriter:2] 2024-11-05 19:25:12,763 
ColumnFamilyStore.java:1379 - Flushed to 
[BigTableReader:big(path='/data/cassandra/system_schema/views-9786ac1cdd583201a7cdad556410c985/oa-5-big-Data.db')] 
(1 sstables, 5.078KiB), biggest 5.078KiB, smallest 5.078KiB
DEBUG [MemtableFlushWriter:1] 2024-11-05 19:25:12,763 
ColumnFamilyStore.java:1379 - Flushed to 
[BigTableReader:big(path='/data/cassandra/system_schema/functions-96489b7980be3e14a70166a0b9159450/oa-5-big-Data.db')] 
(1 sstables, 5.326KiB), biggest 5.326KiB, smallest 5.326KiB
INFO  [StorageServiceShutdownHook] 2024-11-05 19:25:12,763 
HintsService.java:234 - Paused hints dispatch
DEBUG [StorageServiceShutdownHook] 2024-11-05 19:25:12,774 
AbstractCommitLogSegmentManager.java:411 - Segment 
CommitLogSegment(/data/commitlog/CommitLog-8-1730834635824.log) is no 
longer active and will be deleted now
DEBUG [PERIODIC-COMMIT-LOG-SYNCER] 2024-11-05 19:25:12,776 
HeapUtils.java:133 - Heap dump creation on uncaught exceptions is 
disabled.
root@cassandra-0:/var/log/cassandra# FATA[] nsexec-1[3168606]: 
failed to open /proc/3165968/ns/ipc: No such file or directory
FATA[] nsexec-0[3168603]: failed to sync with stage-1: next state: 
Invalid argument
ERRO[] exec failed: unable to start container process: error 
executing setns process: exit status 1
FATA[] nsexec-1[3168622]: failed to open /proc/3165968/ns/ipc: No 
such file or directory
FATA[] nsexec-0[3168621]: failed to sync with stage-1: next state: 
Invalid argument
ERRO[] exec failed: unable to start container process: error 
executing setns process: exit status 1
FATA[] nsexec-1[3168640]: failed to open /proc/3165968/ns/ipc: No 
such file or directory
FATA[] nsexec-0[3168637]: failed to sync with stage-1: next state: 
Invalid argument
ERRO[] exec failed: unable to start container process: error 
executing setns process: exit status 1


--

Any ideas?  I'm at a loss.

-Joe




--
This email has been checked for viruses by AVG antivirus software.
www.avg.com


Re: Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user
Funny enough, we used to run on ext4 and XFS on mdarray RAID1, but the 
crappy disks we had (and still have) randomly spitting out garbage data 
every once in a while. We suspected it's a firmware bug but unable to 
confirm or reliably reproduce it. Other than this behaviour, those disks 
work fine. Since mdarray can't tell which copy of the data is good, it 
has a 50% chance overwriting the good data with the garbage, usually 
resulting in SSTable corruptions. Since we switched to ZFS, never had an 
issue with that again. ZFS saved countless hours of replacing nodes, and 
saved the replacement cost of all the crappy disks (we had many hundreds 
of them). So I'm fairly happy with that outcome. Do I wish we had better 
disks? Yes, definitely. But we don't, so ZFS will have to do it.


I will try to find out what is blocking the barrier, and will post 
updates here if I find anything.


On 05/11/2024 17:22, Jon Haddad wrote:
Yeah, I looked through your stack trace and saw it wasn't the same 
thing, but the steps to identify the root cause should be the same.


I nuked ZFS from orbit :)   This was happening across all the machines 
at various times in the cluster, and we haven't seen a single issue 
since switching to XFS.


Thanks for the advice though, I'll keep it in mind if I encounter it 
again.


Jon

On Tue, Nov 5, 2024 at 9:18 AM Bowen Song via user 
 wrote:


Hi Jon,

That is interesting. We happen to be running Cassandra on ZFS.
However we have not had any incident for years with this setup,
the only change is the recent addition of CDC.

I can see that in CASSANDRA-19564, the MemtablePostFlush thread
was stuck on the unlink() syscall. But in our case, it was stuck here:

"MemtablePostFlush:1" daemon prio=5 tid=237 WAITING
    at jdk.internal.misc.Unsafe.park(Native Method)
    at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:323)
    at

org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:289)
   local variable:
org.apache.cassandra.utils.concurrent.WaitQueue$Standard$RegisteredSignal#1
    at

org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:282)
    at

org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:306)
   local variable:
org.apache.cassandra.utils.concurrent.CountDownLatch$Async#7
    at

org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:338)
    at

org.apache.cassandra.db.ColumnFamilyStore$PostFlush.call(ColumnFamilyStore.java:1094)
   local variable:
org.apache.cassandra.db.ColumnFamilyStore$PostFlush#7
    at

org.apache.cassandra.db.ColumnFamilyStore$PostFlush.call(ColumnFamilyStore.java:1077)
    at
org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47)
    at
org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57)
   local variable: org.apache.cassandra.concurrent.FutureTask#2555
   local variable: org.apache.cassandra.concurrent.FutureTask#2555
    at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
   local variable:
org.apache.cassandra.concurrent.SingleThreadExecutorPlus#5
   local variable:
java.util.concurrent.ThreadPoolExecutor$Worker#9
   local variable:
io.netty.util.concurrent.FastThreadLocalThread#11
   local variable: org.apache.cassandra.concurrent.FutureTask#2555
    at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
   local variable:
java.util.concurrent.ThreadPoolExecutor$Worker#9
    at

io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
   local variable:
io.netty.util.concurrent.FastThreadLocalRunnable#11
    at java.lang.Thread.run(Thread.java:829)
   local variable:
io.netty.util.concurrent.FastThreadLocalThread#11

Which made me believe that they are not the same issue.

Thanks for the info though.

BTW, if you still have access to that oddly behaving ZFS
filesystem, have a look at the free space (should have at least
20% free) and the fragmentation ratio in "zpool list". If neither
is an issue, check whether deduplication is enabled or there's a
large number of snapshots. These can have significant impact on
file deletion performance on ZFS. Also worth checking the disks, I
have seen broken disks that stuck on some operations, e.g. when a
specific sector is being read, and this will certainly affect the
filesystem behaviour.

Cheers,
Bowen

On 05/11/2024 16:41, Jon Haddad wrote:

I ran into this a few months ago, and in my case I tracked it
down to an issue with ZFS not unlinking commitlogs properly.

https://issues.apache.org/jira/browse/CASSANDRA-

Re: Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user
I will give it a try and see what I can find. I plan to go down the 
rabbit hole tomorrow. Will keep you updated.


On 05/11/2024 17:34, Jeff Jirsa wrote:



On Nov 5, 2024, at 4:12 AM, Bowen Song via user 
 wrote:


Writes on this node starts to timeout and fail. But if left 
untouched, it's only gonna get worse, and eventually lead to JVM OOM 
and crash.


By inspecting the heap dump created at OOM, we can see that both of 
the MemtableFlushWriter threads are stuck on line 1190 
 
in the ColumnFamilyStore.java:


    // mark writes older than the barrier as blocking 
progress, permitting them to exceed our memory limit
    // if they are stuck waiting on it, then wait for them 
all to complete

    writeBarrier.markBlocking();
    writeBarrier.await();   // <--- stuck here

And the MemtablePostFlush thread is stuck on line 1094 
 
in the same file.


    try
    {
    // we wait on the latch for the commitLogUpperBound 
to be set, and so that waiters
    // on this task can rely on all prior flushes being 
complete

    latch.await();   // <--- stuck here
    }
Our top suspect is CDC interacting with repair, since this started to 
happen shortly after we enabled CDC on the nodes, and each time 
repair was running. But we have not been able to reproduce this in a 
testing cluster, and don't know what's the next step to troubleshoot 
this issue. So I'm posting it in the mailing lists and hoping someone 
may know something about it or point me to the right direction.




Wouldn’t be completely surprised if  CDC  or repair somehow has a 
barrier, I’ve also seen similar behavior pre-3.0 with “very long 
running read commands” that have a barrier on the memtable that 
prevent release.


You’ve got the heap (great, way better than most people debugging), 
are you able to navigate through it and look for references to that 
memtable or other things holding a barrier?






Upgrade from 4 to 5 issue

2024-11-05 Thread Joe Obernberger
Hi all - getting an error trying to upgrade our 4.x cluster to 5.  The 
following message repeats over and over and then the pod crashes:


Heap dump creation on uncaught exceptions is disabled.




DEBUG [MemtableFlushWriter:2] 2024-11-05 19:25:12,763 
ColumnFamilyStore.java:1379 - Flushed to 
[BigTableReader:big(path='/data/cassandra/system_schema/views-9786ac1cdd583201a7cdad556410c985/oa-5-big-Data.db')] 
(1 sstables, 5.078KiB), biggest 5.078KiB, smallest 5.078KiB
DEBUG [MemtableFlushWriter:1] 2024-11-05 19:25:12,763 
ColumnFamilyStore.java:1379 - Flushed to 
[BigTableReader:big(path='/data/cassandra/system_schema/functions-96489b7980be3e14a70166a0b9159450/oa-5-big-Data.db')] 
(1 sstables, 5.326KiB), biggest 5.326KiB, smallest 5.326KiB
INFO  [StorageServiceShutdownHook] 2024-11-05 19:25:12,763 
HintsService.java:234 - Paused hints dispatch
DEBUG [StorageServiceShutdownHook] 2024-11-05 19:25:12,774 
AbstractCommitLogSegmentManager.java:411 - Segment 
CommitLogSegment(/data/commitlog/CommitLog-8-1730834635824.log) is no 
longer active and will be deleted now
DEBUG [PERIODIC-COMMIT-LOG-SYNCER] 2024-11-05 19:25:12,776 
HeapUtils.java:133 - Heap dump creation on uncaught exceptions is disabled.
root@cassandra-0:/var/log/cassandra# FATA[] nsexec-1[3168606]: 
failed to open /proc/3165968/ns/ipc: No such file or directory
FATA[] nsexec-0[3168603]: failed to sync with stage-1: next state: 
Invalid argument
ERRO[] exec failed: unable to start container process: error 
executing setns process: exit status 1
FATA[] nsexec-1[3168622]: failed to open /proc/3165968/ns/ipc: No 
such file or directory
FATA[] nsexec-0[3168621]: failed to sync with stage-1: next state: 
Invalid argument
ERRO[] exec failed: unable to start container process: error 
executing setns process: exit status 1
FATA[] nsexec-1[3168640]: failed to open /proc/3165968/ns/ipc: No 
such file or directory
FATA[] nsexec-0[3168637]: failed to sync with stage-1: next state: 
Invalid argument
ERRO[] exec failed: unable to start container process: error 
executing setns process: exit status 1


--

Any ideas?  I'm at a loss.

-Joe


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com