Hi Jiri,
Thank you for taking a look at this issue. But I'm sorry, I don't really
understand your message. Can you please elaborate?
Cheers,
Bowen
On 05/11/2024 12:34, Jiri Steuer (EIT) wrote:
Hi all,
It is possible easy to check the moment/milestone, when the data cross
more data centers
Of cause, let me explain the situation. I have a common question without direct
relation the problem with “Unexplained stuck memtable flush”. I would like to
know, how can I identify situation that all nodes cross all data centers will
be synch.
* It is little tricky to wait e.g. 1 day, 2 d
If it is not related to the memtable flush issue, can you please post in
a different mailing list thread instead?
By replying to this thread, everyone reading it would initially assume
it is somehow related, which neither good for them (wasting their time
to try to understand it) nor you (your
Each physical data center corresponds to a "logical" Cassandra DC (a group
of nodes).
In our situation, we need to move one of our physical data centers (i.e.,
the server rooms) to a new location, which will involve an extended period
of downtime.
Thanks
Edi
On Tue, Nov 5, 2024 at 1:27 PM Bowen S
From the way you wrote this, I suspect the name DC may have different
meaning here. Are you talking about the physical location (i.e server
rooms), or the Cassandra DC (i.e. group of nodes for replication purposes)?
On 05/11/2024 11:01, edi mari wrote:
Hello,
We have a Cassandra cluster deploy
You just confirmed my suspicion. You are indeed referring to both
physical location of servers and the logical Cassandra DC with the same
term here.
The questions are related to the procedure of migrating the server
hardware to a new location, not the Cassandra DC.
Assuming that the IP addre
Hi all,
It is possible easy to check the moment/milestone, when the data cross more
data centers will by synch (in case that other applications and user access
will be disabled)? I think about monitoring of throughput or …? Thx for feedback
J. Steuer
This item's classification is Internal
Hi all,
We have a cluster running Cassandra 4.1.1. We are seeing the memtable
flush randomly getting stuck. This has happened twice in the last 10
days, to two different nodes in the same cluster. This started to happen
after we enabled CDC, and each time it got stuck, there was at least one
Hi Bowen, would it be possible to share a full thread dump?
Regards,
Dmitry
On Tue, 5 Nov 2024 at 12:12, Bowen Song via user
wrote:
> Hi all,
>
> We have a cluster running Cassandra 4.1.1. We are seeing the memtable
> flush randomly getting stuck. This has happened twice in the last 10 days,
>
Thank you for your reply, Bowen.
Correct, the questions were about migrating the server hardware to a new
location, not the Cassandra DC.
Wouldn’t it be a good idea to use the hints to complete the data to DC3?
I'll extend the hint window (e.g., to one week) and allow the other data
centers (DC1 a
I am speaking about a thread dump (stack traces for all threads), not a
heap dump. The heap dump should contain thread stacks info.
Thread dump (stack traces) is small and does not have sensitive info.
Regards,
Dmitry
On Tue, 5 Nov 2024 at 13:53, Bowen Song via user
wrote:
> It's about 18GB in
It's about 18GB in size and may contain a huge amount of sensitive data
(e.g. all the pending writes), so I can't share it. However, if there's
any particular piece of information you would like to have, I'm more
than happy to extract the info from the dump and and share it here.
On 05/11/2024
Sorry, I must have misread it. The full thread dump is attached. I
compressed it with gzip because the text file is over 1 MB in size.
On 05/11/2024 14:04, Dmitry Konstantinov wrote:
I am speaking about a thread dump (stack traces for all threads), not
a heap dump. The heap dump should contain
Hinted hand off is a best effort approach, and relying on it alone is a
bad idea. Hints can get lost due to a number of reasons, such as getting
too old or too big, or the node storing the hints dies. You should rely
on regular repair to guarantee the correctness of the data. You may use
hinted
I ran into this a few months ago, and in my case I tracked it down to an
issue with ZFS not unlinking commitlogs properly.
https://issues.apache.org/jira/browse/CASSANDRA-19564
On Tue, Nov 5, 2024 at 6:05 AM Dmitry Konstantinov
wrote:
> I am speaking about a thread dump (stack traces for all th
Hi Jon,
That is interesting. We happen to be running Cassandra on ZFS. However
we have not had any incident for years with this setup, the only change
is the recent addition of CDC.
I can see that in CASSANDRA-19564, the MemtablePostFlush thread was
stuck on the unlink() syscall. But in our
Yeah, I looked through your stack trace and saw it wasn't the same thing,
but the steps to identify the root cause should be the same.
I nuked ZFS from orbit :) This was happening across all the machines at
various times in the cluster, and we haven't seen a single issue since
switching to XFS.
> On Nov 5, 2024, at 4:12 AM, Bowen Song via user
> wrote:
>
> Writes on this node starts to timeout and fail. But if left untouched, it's
> only gonna get worse, and eventually lead to JVM OOM and crash.
>
> By inspecting the heap dump created at OOM, we can see that both of the
> Memtable
Hello,
We have a Cassandra cluster deployed across three different data centers,
with each data center (DC1, DC2, and DC3) hosting 50 Cassandra nodes.
We are currently saving one replica in each data center.
We plan to migrate DC3, including storage and servers, to a new data
center.
1. What woul
Found issue - num tokens was set incorrectly in my container. Upgrade
successful!
-Joe
On 11/5/2024 2:27 PM, Joe Obernberger wrote:
Hi all - getting an error trying to upgrade our 4.x cluster to 5. The
following message repeats over and over and then the pod crashes:
Heap dump creation on u
Funny enough, we used to run on ext4 and XFS on mdarray RAID1, but the
crappy disks we had (and still have) randomly spitting out garbage data
every once in a while. We suspected it's a firmware bug but unable to
confirm or reliably reproduce it. Other than this behaviour, those disks
work fine
I will give it a try and see what I can find. I plan to go down the
rabbit hole tomorrow. Will keep you updated.
On 05/11/2024 17:34, Jeff Jirsa wrote:
On Nov 5, 2024, at 4:12 AM, Bowen Song via user
wrote:
Writes on this node starts to timeout and fail. But if left
untouched, it's only
Hi all - getting an error trying to upgrade our 4.x cluster to 5. The
following message repeats over and over and then the pod crashes:
Heap dump creation on uncaught exceptions is disabled.
DEBUG [MemtableFlushWriter:2] 2024-11-05 19:25:12,763
ColumnFamilyStore.java:1379 - Flu
23 matches
Mail list logo