Re: Access to the JIRA

2025-02-05 Thread Artem Golovko
Hi Sebastian, Thank you very much, created an account and bug report as well. And is it possible to be invited to the Slack channel as a guest? With best regards, Artem ср, 5 февр. 2025 г. в 18:19, Sebastian Marsching : > > Hi Artem, > > you can request a JIRA account here: > https://selfserve

Re: Access to the JIRA

2025-02-05 Thread Sebastian Marsching
Hi Artem, you can request a JIRA account here: https://selfserve.apache.org/jira-account.html Best regards, Sebastian > Am 05.02.2025 um 18:19 schrieb Artem Golovko : > > Hello, > > I would like to report a cassandra bug and would like to have jira account > for that. > > Thanks! smime.p

Re: [RELEASE] Apache Cassandra 3.11.18 released

2025-02-05 Thread Štefan Miklošovič
The Cassandra team have identified a performance regression in releases 3.0.31 and 3.11.18. This regression only affects these specific versions and does not occur in the recent 4.0.16, 4.1.8 or 5.0.3 releases. Users are advised to be aware of this when considering upgrades on the 3.0 and 3.11 li

Re: [RELEASE] Apache Cassandra 3.0.31 released

2025-02-05 Thread Štefan Miklošovič
The Cassandra team have identified a performance regression in releases 3.0.31 and 3.11.18. This regression only affects these specific versions and does not occur in the recent 4.0.16, 4.1.8 or 5.0.3 releases. Users are advised to be aware of this when considering upgrades on the 3.0 and 3.11 li

Re: [RELEASE] Apache Cassandra 4.1.8 released

2025-02-03 Thread A via user
UNSUBSCRIBE Sent from Yahoo Mail for iPhone On Monday, February 3, 2025, 6:00 PM, Štefan Miklošovič wrote: The Cassandra team is pleased to announce the release of Apache Cassandra version 4.1.8. Apache Cassandra is a fully distributed database. It is the right choice when you need scalab

Re: Cassandra 5 upgrade and Schema Agreement failures

2025-01-30 Thread Paul Chandler
Yes, I would add the 6 new nodes, then decommission the 6 original nodes, then upgrade to 5.0. Remember to change the seed node values before you decommission the old nodes. Thanks Paul > On 30 Jan 2025, at 14:42, Luciano Greiner wrote: > > Ok, so you mean I setup these new 6 nodes as 4.1.3

Re: Cassandra 5 upgrade and Schema Agreement failures

2025-01-30 Thread Luciano Greiner
Ok, so you mean I setup these new 6 nodes as 4.1.3, then just upgrade the software, correct? Thank you! Luciano On Thu, Jan 30, 2025 at 6:57 AM Paul Chandler wrote: > > Hi Luciano, > > The problem occurs due to Cassandra 5 making changes to the system tables, so > the cluster will be in schem

Re: Cassandra 5 upgrade and Schema Agreement failures

2025-01-30 Thread Paul Chandler
Hi Luciano, The problem occurs due to Cassandra 5 making changes to the system tables, so the cluster will be in schema mismatch during the upgrade process, until all the nodes are on 5.0 Normally this would not be a problem, as the system tables are not replicated anyway, but, as you are find

Re: Schema agreement check after DDL query is unreliable

2025-01-16 Thread Tommy Stendahl via user
Cc: Tommy Stendahl Subject: Re: Schema agreement check after DDL query is unreliable Date: Thu, 16 Jan 2025 10:52:17 + Sorry, wrong list :-) -Original Message- From: Tommy Stendahl via user Reply-To: user@cassandra.apache.org To: user@cassandra.apache.org Cc: Tommy Stendahl Subje

Re: Schema agreement check after DDL query is unreliable

2025-01-16 Thread Tommy Stendahl via user
Sorry, wrong list :-) -Original Message- From: Tommy Stendahl via user Reply-To: user@cassandra.apache.org To: user@cassandra.apache.org Cc: Tommy Stendahl Subject: Schema agreement check after DDL query is unreliable Date: Thu, 16 Jan 2025 10:33:32 + Hi, When we upgraded the java

Re: Enable audit log

2025-01-15 Thread Sebastian Albrecht
On Tue, 14 Jan 2025 at 17:02, Andrew Weaver wrote: > I can confirm that on 4.0.x it works as expected because we use this > extensively. > Hi Andrew, that is good to hear. Although i also tried it with the latest 4.0.15 and see worse behaviour: Activating auditlog with nodetool shows in syslog:

Re: Enable audit log

2025-01-15 Thread Sebastian Albrecht
On Tue, 14 Jan 2025 at 17:00, Dmitry Konstantinov wrote: > Hi all, > > > > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/audit/AuditLogManager.java#L204 > > I suppose this logic should work during a startup: > https://github.com/apache/cassandra/blob/trunk/src/java/

Re: UNSUBSCRIBE

2025-01-14 Thread jose farfan
UNSUBSCRIBE On Tue, 14 Jan 2025 at 19:32, A via user wrote: > UNSUBSCRIBE > > > Sent from Yahoo Mail for iPhone >

Re: UNSUBSCRIBE

2025-01-14 Thread jose farfan
> > UNSUBSCRIBE > >

Re: Enable audit log

2025-01-14 Thread Andrew Weaver
I can confirm that on 4.0.x it works as expected because we use this extensively. On Tue, Jan 14, 2025, 10:00 AM Jeff Jirsa wrote: > Surprising. Feels like something that should change. If it’s enabled in > yaml, why WOULDNT we want it started on start? > > > > On Jan 14, 2025, at 7:40 AM, Štefa

Re: Enable audit log

2025-01-14 Thread Dmitry Konstantinov
Hi all, > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/audit/AuditLogManager.java#L204 I suppose this logic should work during a startup: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/audit/AuditLogManager.java#L109 , shouldn't? It wo

Re: Enable audit log

2025-01-14 Thread Jeff Jirsa
Surprising. Feels like something that should change. If it’s enabled in yaml, why WOULDNT we want it started on start? > On Jan 14, 2025, at 7:40 AM, Štefan Miklošovič wrote: > > Hi Sebastian, > > the behaviour you see seems to be a conscious decision: > > https://github.com/apache/cassand

Re: Enable audit log

2025-01-14 Thread Štefan Miklošovič
Hi Sebastian, the behaviour you see seems to be a conscious decision: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/audit/AuditLogManager.java#L204 On Tue, Jan 14, 2025 at 4:21 PM Sebastian Albrecht < sebastian.albre...@agido.com> wrote: > Hi, > i am using cassand

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-18 Thread Dinesh Joshi
On Wed, Dec 18, 2024 at 11:50 AM Paul Chandler wrote: > > This is all old history and has been fixed, so is not really what the > question was about, however these old problems have a bad legacy in the > memory of the people that matter. Hence the push back we have now. > I totally understand th

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-18 Thread Paul Chandler
OK, it seems like I didn’t explain it too well, but yes it is the rolling restart 3 times as part of the upgrade that is causing the push back, my message was a bit vague on the use cases because there are confidentiality agreements in place so I can’t share too much. We have had problems in th

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-18 Thread Eric Evans
On Wed, Dec 18, 2024 at 12:26 PM Jeff Jirsa wrote: > I think this is one of those cases where if someone tells us they’re > feeling pain, instead of telling them it shouldn’t be painful, we try to > learn a bit more about the pain. > > For example, both you and Scott expressed surprise at the con

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-18 Thread Eric Evans
On Wed, Dec 18, 2024 at 12:12 PM Jon Haddad wrote: > I think we're talking about different things. > > > Yes, and Paul clarified that it wasn't (just) an issue of having to do > rolling restarts, but the work involved in doing an upgrade. Were it only > the case that the hardest part of doing a

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-18 Thread Jeff Jirsa
I think this is one of those cases where if someone tells us they’re feeling pain, instead of telling them it shouldn’t be painful, we try to learn a bit more about the pain. For example, both you and Scott expressed surprise at the concern of rolling restarts (you repeatedly, Scott mentioned t

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-18 Thread Jon Haddad
Yeah, the issue with the yaml being out of sync is consistent with any other JMX change, such as compaction throughput / threads, etc. You'd have to deploy the config and apply the change via JMX otherwise you'd risk restarting the node and running into an issue. I think there's probably room for

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-18 Thread Jon Haddad
I think we're talking about different things. > Yes, and Paul clarified that it wasn't (just) an issue of having to do rolling restarts, but the work involved in doing an upgrade. Were it only the case that the hardest part of doing an upgrade was the rolling restart... >From several messages a

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-18 Thread C. Scott Andreas
It's clear from discussion on this list that the current "storage_compatibility_mode" implementation and upgrade path for 5.0 is a source of real and legitimate user pain, and is likely to result in many organizations slowing their adoption of the release. Would love to discuss on dev@ how we can

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-18 Thread Eric Evans
On Wed, Dec 18, 2024 at 11:43 AM Jon Haddad wrote: > > We (Wikimedia) have had more (major) upgrades go wrong in some way, than > right. Any significant upgrade is going to be weeks —if not months— in the > making, with careful testing, a phased rollout, and a workable plan for > rollback. We'd

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-18 Thread Jon Haddad
> We (Wikimedia) have had more (major) upgrades go wrong in some way, than right. Any significant upgrade is going to be weeks —if not months— in the making, with careful testing, a phased rollout, and a workable plan for rollback. We'd never entertain doing more than one at a time, it's just way

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-18 Thread Eric Evans
On Tue, Dec 17, 2024 at 2:37 PM Paul Chandler wrote: > It is a mixture of things really, firstly it is a legacy issue where there > have been performance problems in the past during upgrades, these have now > been fixed, but it is not easy to regain the trust in the process. > > Secondly there ar

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-18 Thread Paul Chandler
The ability to move through the SCM via the nodetool would definitely help in this situation. I can see there being an issue is the cassandra.yaml is not changed, as the node could revert back to an older mode if the node is restarted. Would there be any other potential problems with exposing

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-17 Thread C. Scott Andreas
get a table repaired because it is locking up a node, is it still possible to upgrade to 4.0? Jeff From: Jon Haddad Reply-To: Date: Tuesday, December 17, 2024 at 2:20 PM To: Subject: Re: Cassandra 5 Upgrade - Storage Compatibility Modes I strongly suggest moving to 4.0 and to set up Reaper. Man

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-17 Thread Jeff Masud
To: Subject: Re: Cassandra 5 Upgrade - Storage Compatibility Modes I strongly suggest moving to 4.0 and to set up Reaper. Managing repairs yourself is a waste of time, and you're almost certainly not doing it optimally. Jon On Tue, Dec 17, 2024 at 12:40 PM Miguel Santos-Lopez wrot

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-17 Thread Jon Haddad
nded recipient of this email, you must not take any > action based upon its contents, nor copy or show it to anyone. Please > contact the sender if you believe you have received this email in error. > > -- > *From:* Josh McKenzie > *Sent:* Tuesday, December 17, 2024 3:11:06 PM

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-17 Thread Jon Haddad
> Secondly there are some very large clusters involved, 1300+ nodes across multiple physical datacenters, in this case any upgrades are only done out of hours and only one datacenter per day. So a normal upgrade cycle will take multiple weeks, and this one will take 3 times as long. If you only r

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-17 Thread Paul Chandler
Hi Jon, It is a mixture of things really, firstly it is a legacy issue where there have been performance problems in the past during upgrades, these have now been fixed, but it is not easy to regain the trust in the process. Secondly there are some very large clusters involved, 1300+ nodes acro

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-17 Thread Josh McKenzie
It's kind of a shame we don't have rolling restart functionality built in to the database / sidecar. I know we've discussed that in the past. +1 to Jon's question - clients (i.e. java driver, etc) should be able to handle disconnects gracefully and route to other coordinators leaving the applic

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-17 Thread Jon Haddad
Just curious, why is a rolling restart difficult? Is it a tooling issue, stability, just overall fear of messing with things? You *should* be able to do a rolling restart without it being an issue. I look at this as a fundamental workflow that every C* operator should have available, and you

Re: Cassandra 3.11 with Gocql v1.7.0 SSL/TLS handshake failure

2024-12-06 Thread Stanislav Bychkov
Hello, Dileep Rao I have run tests in my environment using docker containers for Cassandra versions 3.11.0 and 4.0.8 with gocql v1.7.0. I tested both with tls enabled and without tls, and I did not encounter any connection issues. To help pinpoint the issue you're facing, I need a bit more informat

Re: Cassandra Restore Issue

2024-12-02 Thread Soyal Badkur
How are you doing backups? 1) running nodetool snapshot command on all nodes at once 2) while cluster is backup mode we take the EBS snapshot backup of that volume 3) running the nodetool clearsnapshot for existing the backup mode. How are you doing restores? 1) Creating volume from snapshot 2) At

Re: Cassandra Restore Issue

2024-12-01 Thread Jeff Jirsa
3.11.2 is from Feb 2018. It’s a 6 year old release. It’s VERY hard to guess what’s happening here without a lot more info. How are you doing backups? How are you doing restores? What consistency level are you using for writes? Reads? Is the data in the sstable (can you find it with sstabledump?

Re: Cassandra Restore Issue

2024-12-01 Thread Soyal Badkur
Yes, you are right about it, we are not facing any issues while restoring data in 3.11.4. This is only happening with 3.11.2 For backup we are using following steps 1) running nodetool snapshot command on all nodes at once 2) while cluster is backup mode we take the EBS snapshot backup of that vol

Re: Cassandra Restore Issue

2024-12-01 Thread guo Maxwell
Hello, You mean 3.11.2 will have the problem of backup data loss, but 3.11.4 will not have this problem, right? Can you provide the backup and recovery steps for these two versions? If possible, it would be better to give a few examples. For example, if three data a, b, and c are inserted, and a w

Re: Token Assignment Strategy for Single-Token Nodes with Multi-Datacenter

2024-12-01 Thread Long Pan
Thanks Jeff for the inspiring reply!

Re: Token Assignment Strategy for Single-Token Nodes with Multi-Datacenter

2024-11-30 Thread Jeff Jirsa
You've enumerated the options and tradeoffs correctly. I've personally seen both implemented, and they're both fine. With option 1, there's also an option that you don't just do "primary range" based repairs, but rather, let a scheduler run through the token range, and use any replica in any D

Re: Cassandra snapshot with TTL

2024-11-19 Thread edi mari
Working, Thanks On Tue, Nov 19, 2024 at 12:01 PM guo Maxwell wrote: > try to replace 1HOURS with 1h ? > > edi mari 于2024年11月19日周二 17:52写道: > >> Hi, >> I'm attempting to use the `snapshot` command with the `--tt` option, but >> I keep encountering an error. >> Can you help me figure out what I

Re: Cassandra snapshot with TTL

2024-11-19 Thread guo Maxwell
try to replace 1HOURS with 1h ? edi mari 于2024年11月19日周二 17:52写道: > Hi, > I'm attempting to use the `snapshot` command with the `--tt` option, but I > keep encountering an error. > Can you help me figure out what I might be doing wrong? > > I'm using Cassandra v4.1.5. > > *nodetool snapshot --tt

Re: vector search question - 5.02

2024-11-14 Thread Jon Haddad
It sounds like enabling the JDK's vector preview api could significantly improve Vector search. I haven't verified this myself, but it might be worth trying Java 17 + this flag: --add-modules jdk.incubator.vector I'd love to hear how much of a difference this makes. Jon On Fri, Nov 8, 2024 at

Re: Unexplained stuck memtable flush

2024-11-13 Thread Bowen Song via user
It's interesting how they organised the documentation. So it is guaranteed that the ConcurrentLinkedQueue can be modified and won't break the iterator. But I don't see anything mentioning the reverse. Can an iterator removing items from the middle of a queue (which by definition is FIFO) bre

Re: Unexplained stuck memtable flush

2024-11-13 Thread Sebastian Marsching
Hi Bowen, > I've been reading the source code and the Java documentation. > > In the code > : > > public void signalAll() > { >

Re: Unexplained stuck memtable flush

2024-11-12 Thread Bowen Song via user
Hi Jaydeep, Great work! I believe you are spot on. You should create a Jira ticket detail the findings. Cheers, Bowen On 10/11/2024 21:49, Jaydeep Chovatia wrote: Upon further studying the thread dump, I think there is a deadlock between CompactionExecutor (Thread1) & MemtableReclaimMemory (

Re: Unexplained stuck memtable flush

2024-11-10 Thread Jon Haddad
I think this is the correct explanation. It's very similar to CASSANDRA-19576, where compaction is unable to finish because we can't insert into compaction history. Really good analysis, Jaydeep. Jon On Sun, Nov 10, 2024 at 1:51 PM Jaydeep Chovatia wrote: > Upon further studying the thread d

Re: Unexplained stuck memtable flush

2024-11-10 Thread Jaydeep Chovatia
Upon further studying the thread dump, I think there is a deadlock between CompactionExecutor (Thread1) & MemtableReclaimMemory (Thread2) in the stack trace I have mentioned in my email below. Please find the deadlock details here: 1. *Thread1:* It has invoked *readOrdering.start()* on *baseCf

Re: vector search question - 5.02

2024-11-08 Thread Patrick McFadin
Hey Joe, I think you are running into a limitation we found in JVector 1 and the use of HNSW. This is where release timing sucks. Jonathan continued work on JVector after merging the initial version into Cassandra 5 trunk before the code freeze. There was JVector 2 and then 3 which is in the JVect

Re: Unexplained stuck memtable flush

2024-11-08 Thread Bowen Song via user
Hi Abe, The com.datastax.oss.cdc.agent thing in the thread dump is the code forked from https://github.com/datastax/cdc-apache-cassandra/ . It sends the CDC data to Kafka instead of Pulsar, and some other additional features/changes. Due to the lack of documentation and examples on how to re

Re: Unexplained stuck memtable flush

2024-11-08 Thread Abe Ratnofsky
Hey Bowen, Are you able to reproduce the issue without running com.datastax.oss.cdc.agent? I don't see any glaringly obvious bugs there indicated by the thread dump, but it would be useful to rule that out, particularly because it runs as an agent and manages new instances of C*-defined classes

Re: Unexplained stuck memtable flush

2024-11-07 Thread Dmitry Konstantinov
Few ideas about timing and timestamps extraction from the heap dump: 1) Here https://github.com/apache/cassandra/blob/c1d89c32d27921d1f77f05d29ee248b8922a4c76/src/java/org/apache/cassandra/db/Keyspace.java#L627 you can check the time when the write operation within "read-hotness-tracker:1" started

Re: vector search question - 5.02

2024-11-07 Thread Joe Obernberger
Found my issue, it was with the primary key being a combination of uuid and type.  With that fixed, I now have a table with 1.5 million vectors (768 dimensions) on a 16 node cluster. While I can now execute a CQL query that includes fields and the order by ANN, it runs too slow.  No query compl

Re: Unexplained stuck memtable flush

2024-11-07 Thread Dmitry Konstantinov
Hi Bowen, lastSyncedAt is updated by taking pollStarted: lastSyncedAt = pollStarted; where long pollStarted = clock.now(); the logic uses clock from: SyncRunnable sync = new SyncRunnable(preciseTime) preciseTime by default is SystemClock clock.now() -> org.apache.cassandra.utils.Clock.Default#nano

Re: Unexplained stuck memtable flush

2024-11-07 Thread Bowen Song via user
The syncComplete is an instance of WaitQueue, and the syncComplete.queue is an instance of ConcurrentLinkedQueue. Surprisingly, the queue is empty. There is no item in the queue's linked list, only the head and tail nodes, each has item=null. The usage of the WaitQueue within the AbstractCommi

Re: Unexplained stuck memtable flush

2024-11-06 Thread Bowen Song via user
Sorry, I mistaken CassandraTableWriteHandler for CassandraKeyspaceWriteHandler, and thought the code has changed. I was wrong, the code has not changed, they are two different files. Your read-hotness-tracker thread dump is: "read-hotness-tracker:1" - Thread t@113 java.lang.Thread.State: WAITI

Re: Unexplained stuck memtable flush

2024-11-06 Thread Bowen Song via user
I can see some similarities and some differences between your thread dump and ours. In your thread dump: * no MemtableFlushWriter thread * the MemtablePostFlush thread is idle * the MemtableReclaimMemory thread is waiting for a barrier, possibly stuck * the read-hotness-tracker thread is

Re: Unexplained stuck memtable flush

2024-11-06 Thread Jaydeep Chovatia
We have also seen this issue a few times in our production (4.1). My teammates added a thread dump here . One of my theories is

Re: Unexplained stuck memtable flush

2024-11-06 Thread Bowen Song via user
I think I'm getting really close now. This seems to have something to do with the "read-hotness-tracker:1" thread. The thread dump is: "read-hotness-tracker:1" daemon prio=5 tid=93 WAITING     at jdk.internal.misc.Unsafe.park(Native Method)     at java.util.concurrent.locks.LockSupport.park(Lock

Re: Upgrade from 4 to 5 issue

2024-11-05 Thread Joe Obernberger
Found issue - num tokens was set incorrectly in my container. Upgrade successful! -Joe On 11/5/2024 2:27 PM, Joe Obernberger wrote: Hi all - getting an error trying to upgrade our 4.x cluster to 5.  The following message repeats over and over and then the pod crashes: Heap dump creation on u

Re: Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user
I will give it a try and see what I can find. I plan to go down the rabbit hole tomorrow. Will keep you updated. On 05/11/2024 17:34, Jeff Jirsa wrote: On Nov 5, 2024, at 4:12 AM, Bowen Song via user wrote: Writes on this node starts to timeout and fail. But if left untouched, it's only

Re: Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user
Funny enough, we used to run on ext4 and XFS on mdarray RAID1, but the crappy disks we had (and still have) randomly spitting out garbage data every once in a while. We suspected it's a firmware bug but unable to confirm or reliably reproduce it. Other than this behaviour, those disks work fine

Re: Unexplained stuck memtable flush

2024-11-05 Thread Jeff Jirsa
> On Nov 5, 2024, at 4:12 AM, Bowen Song via user > wrote: > > Writes on this node starts to timeout and fail. But if left untouched, it's > only gonna get worse, and eventually lead to JVM OOM and crash. > > By inspecting the heap dump created at OOM, we can see that both of the > Memtable

Re: Unexplained stuck memtable flush

2024-11-05 Thread Jon Haddad
Yeah, I looked through your stack trace and saw it wasn't the same thing, but the steps to identify the root cause should be the same. I nuked ZFS from orbit :) This was happening across all the machines at various times in the cluster, and we haven't seen a single issue since switching to XFS.

Re: Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user
Hi Jon, That is interesting. We happen to be running Cassandra on ZFS. However we have not had any incident for years with this setup, the only change is the recent addition of CDC. I can see that in CASSANDRA-19564, the MemtablePostFlush thread was stuck on the unlink() syscall. But in our

Re: Migration Cassandra to a new data center

2024-11-05 Thread Bowen Song via user
Hinted hand off is a best effort approach, and relying on it alone is a bad idea. Hints can get lost due to a number of reasons, such as getting too old or too big, or the node storing the hints dies. You should rely on regular repair to guarantee the correctness of the data. You may use hinted

Re: Unexplained stuck memtable flush

2024-11-05 Thread Jon Haddad
I ran into this a few months ago, and in my case I tracked it down to an issue with ZFS not unlinking commitlogs properly. https://issues.apache.org/jira/browse/CASSANDRA-19564 On Tue, Nov 5, 2024 at 6:05 AM Dmitry Konstantinov wrote: > I am speaking about a thread dump (stack traces for all th

Re: Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user
Sorry, I must have misread it. The full thread dump is attached. I compressed it with gzip because the text file is over 1 MB in size. On 05/11/2024 14:04, Dmitry Konstantinov wrote: I am speaking about a thread dump (stack traces for all threads), not a heap dump. The heap dump should contain

Re: Migration Cassandra to a new data center

2024-11-05 Thread edi mari
Thank you for your reply, Bowen. Correct, the questions were about migrating the server hardware to a new location, not the Cassandra DC. Wouldn’t it be a good idea to use the hints to complete the data to DC3? I'll extend the hint window (e.g., to one week) and allow the other data centers (DC1 a

Re: Unexplained stuck memtable flush

2024-11-05 Thread Dmitry Konstantinov
I am speaking about a thread dump (stack traces for all threads), not a heap dump. The heap dump should contain thread stacks info. Thread dump (stack traces) is small and does not have sensitive info. Regards, Dmitry On Tue, 5 Nov 2024 at 13:53, Bowen Song via user wrote: > It's about 18GB in

Re: Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user
It's about 18GB in size and may contain a huge amount of sensitive data (e.g. all the pending writes), so I can't share it. However, if there's any particular piece of information you would like to have, I'm more than happy to extract the info from the dump and and share it here. On 05/11/2024

Re: Unexplained stuck memtable flush

2024-11-05 Thread Dmitry Konstantinov
Hi Bowen, would it be possible to share a full thread dump? Regards, Dmitry On Tue, 5 Nov 2024 at 12:12, Bowen Song via user wrote: > Hi all, > > We have a cluster running Cassandra 4.1.1. We are seeing the memtable > flush randomly getting stuck. This has happened twice in the last 10 days, >

Re: [External]Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user
user *Sent:* Tuesday, November 5, 2024 1:39 PM *To:* user@cassandra.apache.org *Cc:* Bowen Song *Subject:* Re: [External]Unexplained stuck memtable flush This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links and attachments. Please report all suspicious e-mails to

RE: [External]Unexplained stuck memtable flush

2024-11-05 Thread Jiri Steuer (EIT)
, regards Jiri This item's classification is Internal. It was created by and is in property of the EmbedIT. Do not distribute outside of the organization. From: Bowen Song via user Sent: Tuesday, November 5, 2024 1:39 PM To: user@cassandra.apache.org Cc: Bowen Song Subject: Re: [Ext

Re: [External]Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user
Hi Jiri, Thank you for taking a look at this issue. But I'm sorry, I don't really understand your message. Can you please elaborate? Cheers, Bowen On 05/11/2024 12:34, Jiri Steuer (EIT) wrote: Hi all, It is possible easy to check the moment/milestone, when the data cross more data centers

RE: [External]Unexplained stuck memtable flush

2024-11-05 Thread Jiri Steuer (EIT)
Hi all, It is possible easy to check the moment/milestone, when the data cross more data centers will by synch (in case that other applications and user access will be disabled)? I think about monitoring of throughput or …? Thx for feedback J. Steuer This item's classification is Internal

Re: Migration Cassandra to a new data center

2024-11-05 Thread Bowen Song via user
You just confirmed my suspicion. You are indeed referring to both physical location of servers and the logical Cassandra DC with the same term here. The questions are related to the procedure of migrating the server hardware to a new location, not the Cassandra DC. Assuming that the IP addre

Re: Migration Cassandra to a new data center

2024-11-05 Thread edi mari
Each physical data center corresponds to a "logical" Cassandra DC (a group of nodes). In our situation, we need to move one of our physical data centers (i.e., the server rooms) to a new location, which will involve an extended period of downtime. Thanks Edi On Tue, Nov 5, 2024 at 1:27 PM Bowen S

Re: Migration Cassandra to a new data center

2024-11-05 Thread Bowen Song via user
From the way you wrote this, I suspect the name DC may have different meaning here. Are you talking about the physical location (i.e server rooms), or the Cassandra DC (i.e. group of nodes for replication purposes)? On 05/11/2024 11:01, edi mari wrote: Hello, We have a Cassandra cluster deploy

Re: Kleppmann's idea of data loss - is it valid?

2024-10-29 Thread Sebastian Marsching
Hi Mike, I only skimmed through the article, but I think that the basic argument made there is valid, when using a high number of VNodes in a large cluster. That’s exactly why such a configuration is discouraged. Please refer to the detailed article at https://jolynch.github.io/pdf/cassandra-a

Re: Cross-Node Latency Issues

2024-10-25 Thread Naman kaushik
Hi Manish, Just to clarify, our queries are using only the partition key column, with no additional filtering. It appears Cassandra is automatically logging ALLOW FILTERING, which may be misleading. Since we’re operating strictly on the partition key, we shouldn’t face the usual performance issues

Re: Cross-Node Latency Issues

2024-10-25 Thread MyWorld
Hi, This was a one time issue for which we are looking for the RCA. Generally P99 latencies of all the tables are less than 12ms. There was a few ms jump in P99 on one of the node at this time at coordinator level. The CL is Local_Quorum. Another error we noticed in system log at the same time on

Re: Cross-Node Latency Issues

2024-10-24 Thread Bowen Song via user
Even big cloud providers, like GCP and AWS, can have temporary and minor network issues every now and then. If it was the result of an increase in packet loss for a short duration, TCP retransmission may be of interest. Have a look at /proc/net/netstat on Linux and you will find the relevant me

Re: Cross-Node Latency Issues

2024-10-24 Thread Stéphane Alleaume
Can you share DDL about table and the keyspace related ? Any interesting informations about the queries and CL associated ? Are you running read repair between DC ? NTP (time server) is well configured for all nodes in the 2 datacenters ? No antivirus activity on these nodes ? Kind regards S

Re: [External]Cross-Node Latency Issues

2024-10-24 Thread Naman kaushik
Hi jiri, Thank you for your feedback. I understand that using ALLOW FILTERING can lead to longer processing times and impact our SLA. However, I want to clarify that the queries in question are running only on the partition key. The WHERE clause is utilizing only the partition column for filterin

Re: Cross-Node Latency Issues

2024-10-24 Thread Bowen Song via user
Can you be more explicit about the "latency metrics from Grafana" you looked at? What percentile latencies were you looking at? Any aggregation used? You can post the underlying queries used for the dashboard if that's easier than explaining it. In general you should only care about the max, no

Re: Tombstone Generation in Cassandra 4.1.3 Despite No Update/Delete Operations

2024-10-24 Thread Bowen Song via user
Is one tombstone scanned per query causing any issue? I mean real issues, not the scanning of tombstone itself. On 24/10/2024 04:56, Naman kaushik wrote: Thanks everyone for your responses. We have columns with |list| and |list| types, and after using |sstabledump|, we found that the tombston

Re: Cross-Node Latency Issues

2024-10-24 Thread manish khandelwal
HI Naman If you are querying on non partition key (which seems to be the case here), then please know that it is an anti pattern for Cassandra. For small dataset it may work but for large dataset it may take longer time or generally timeout. Reason for this is Cassandra scans each record if querie

Re: Cross-Node Latency Issues

2024-10-23 Thread MyWorld
How many sstables per read ? -- >> 1 to 4 sstables Are your partitions unbounded ? >> No What max size are the partitions ? >> P99 varies from few bytes to 70KB while max partition of tables varies from few bytes to 900 KB On Thu, Oct 24, 2024 at 11:00 AM Stéphane Alleaume wrote: > Hi, > > How

Re: Cross-Node Latency Issues

2024-10-23 Thread Stéphane Alleaume
Hi, How many sstables per read ? Are your partitions unbounded ? What max size are the partitions ? Kind regards Stéphane Le 24 octobre 2024 06:25:05 GMT+02:00, Naman kaushik a écrit : >Hello everyone, > >We are currently using Cassandra 4.1.3 in a two-data-center cluster. >Recently, we

Re: Tombstone Generation in Cassandra 4.1.3 Despite No Update/Delete Operations

2024-10-23 Thread Naman kaushik
Thanks everyone for your responses. We have columns with list and list types, and after using sstabledump, we found that the tombstones are being generated due to these columns. I’ve encountered another issue related to tombstones in a table that is not involved in any write operations, as it is s

Re: [External]Cross-Node Latency Issues

2024-10-23 Thread Jiri Steuer (EIT)
Hi Naman, To be honest, in case of allow filtering ... you can expect Langer processing time and you cannot keep SLA. Can you use indexes and remove allow filtering? NOTE: you can use SAI in version 5 Regards Jiri Zasláno z Outlooku pro Android This item's clas

Re: Resources on Using Single Vnode in Cassandra

2024-10-09 Thread Long Pan
Thanks a lot, Jordan, Jeff, Abe, guo and Jon! This is really helpful. On Wed, Oct 9, 2024 at 8:59 AM Jon Haddad wrote: > I've worked with a few hundred teams now, including the major ones that > used single token (Apple, Netflix, Spotify), and pretty much all the rest > used some form of vnodes.

Re: Tombstone Generation in Cassandra 4.1.3 Despite No Update/Delete Operations

2024-10-09 Thread Jeff Jirsa
The easiest option here, though still unpleasant, is sstabledump to json and look at the tombstoneUsually when this happens it’s because something unexpected is happening - actually writing nulls or doing deleted or weird short TTLs without realizing it Dump the sstable and look, it’ll be faster th

Re: Tombstone Generation in Cassandra 4.1.3 Despite No Update/Delete Operations

2024-10-09 Thread Jon Haddad
Overwriting non collections does not generate tombstones on compaction. — Jon Haddad Rustyrazorblade Consulting rustyrazorblade.com On Wed, Oct 9, 2024 at 9:57 AM James Shaw wrote: > Hi, Naman: > How does the client side load large amounts of data ? Most > likely it has a retry polic

Re: Tombstone Generation in Cassandra 4.1.3 Despite No Update/Delete Operations

2024-10-09 Thread James Shaw
Hi, Naman: How does the client side load large amounts of data ? Most likely it has a retry policy, either by tool's default or by setting. If it has a retry policy, will have duplicated records, after compaction, will have tombstones. Another possibility, client set ttl, it will overri

  1   2   3   4   5   6   7   8   9   10   >