Re: Cassandra 4.1 compaction thread no longer low priority (cpu nice)

2024-02-22 Thread Dmitry Konstantinov
thThreadPriority(Thread.MIN_PRIORITY)" when using > executorFactory in > https://github.com/apache/cassandra/blob/77a3e0e818df3cce45a974ecc977ad61bdcace47/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L2028 > should do it. > > > Did I miss a reason to no longer use low priority threads for compaction ? > Should I open a bug for re-adding this feature / submit a PR ? > > Regards, > > Pierre Fersing > > -- Dmitry Konstantinov

Re: Cassandra 4.1 compaction thread no longer low priority (cpu nice)

2024-02-22 Thread Dmitry Konstantinov
efault with the Linux kernel, but > that has changed since bfq and mq-deadline were added to the Linux kernel. > Both bfq and mq-deadline supports IO priority, as documented here: > https://docs.kernel.org/block/ioprio.html > > > On 22/02/2024 19:39, Dmitry Konstantinov wrote: &

Re: Unexplained stuck memtable flush

2024-11-05 Thread Dmitry Konstantinov
ne 1094 > <https://github.com/apache/cassandra/blob/8d91b469afd3fcafef7ef85c10c8acc11703ba2d/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1094> > in the same file. > > try > { > // we wait on the latch for the commitLogUpperBound to be > set, and so that waiters > // on this task can rely on all prior flushes being > complete > latch.await(); // <--- stuck here > } > Our top suspect is CDC interacting with repair, since this started to > happen shortly after we enabled CDC on the nodes, and each time repair was > running. But we have not been able to reproduce this in a testing cluster, > and don't know what's the next step to troubleshoot this issue. So I'm > posting it in the mailing lists and hoping someone may know something about > it or point me to the right direction. > > p.s.: sorry about posting this to both the user & dev mailing lists. It's > an end-user related issue but involves Cassandra internals, so I can't > decide which one is best suited. > > Cheers, > Bowen > > > -- Dmitry Konstantinov

Re: Unexplained stuck memtable flush

2024-11-05 Thread Dmitry Konstantinov
line 1190 >> <https://github.com/apache/cassandra/blob/8d91b469afd3fcafef7ef85c10c8acc11703ba2d/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1190> >> in the ColumnFamilyStore.java: >> >> // mark writes older than the barrier as blocking progress, >> permitting them to exceed our memory limit >> // if they are stuck waiting on it, then wait for them all to >> complete >> writeBarrier.markBlocking(); >> writeBarrier.await(); // <--- stuck here >> >> And the MemtablePostFlush thread is stuck on line 1094 >> <https://github.com/apache/cassandra/blob/8d91b469afd3fcafef7ef85c10c8acc11703ba2d/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1094> >> in the same file. >> >> try >> { >> // we wait on the latch for the commitLogUpperBound to be >> set, and so that waiters >> // on this task can rely on all prior flushes being >> complete >> latch.await(); // <--- stuck here >> } >> Our top suspect is CDC interacting with repair, since this started to >> happen shortly after we enabled CDC on the nodes, and each time repair was >> running. But we have not been able to reproduce this in a testing cluster, >> and don't know what's the next step to troubleshoot this issue. So I'm >> posting it in the mailing lists and hoping someone may know something about >> it or point me to the right direction. >> >> p.s.: sorry about posting this to both the user & dev mailing lists. It's >> an end-user related issue but involves Cassandra internals, so I can't >> decide which one is best suited. >> >> Cheers, >> Bowen >> >> >> > > -- > Dmitry Konstantinov > > -- Dmitry Konstantinov

Re: Unexplained stuck memtable flush

2024-11-07 Thread Dmitry Konstantinov
al# *state* org.apache.cassandra.utils.concurrent.WaitQueue.Standard.RegisteredSignal# *thread* in the object here?: "read-hotness-tracker:1" daemon prio=5 tid=93 WAITING at org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:289) Regards, Dmitry On Thu, 7 Nov 2024 at 21:30, Dmitry Konstantinov

Re: Unexplained stuck memtable flush

2024-11-07 Thread Dmitry Konstantinov
ere > } > Our top suspect is CDC interacting with repair, since this started to > happen shortly after we enabled CDC on the nodes, and each time repair was > running. But we have not been able to reproduce this in a testing cluster, > and don't know what's the next step to troubleshoot this issue. So I'm > posting it in the mailing lists and hoping someone may know something about > it or point me to the right direction. > > > Wouldn’t be completely surprised if CDC or repair somehow has a barrier, > I’ve also seen similar behavior pre-3.0 with “very long running read > commands” that have a barrier on the memtable that prevent release. > > You’ve got the heap (great, way better than most people debugging), are > you able to navigate through it and look for references to that memtable or > other things holding a barrier? > > > > > -- Dmitry Konstantinov

Re: Enable audit log

2025-01-14 Thread Dmitry Konstantinov
audit logs to be seen. After cassandra restart it >> seems to be disabled again. >> Anyone also came across that? >> >> Thx, >> Sebastian. >> > -- Dmitry Konstantinov

Re: Cassandra Memory Spikes - Tuning Suggestions?

2025-02-27 Thread Dmitry Konstantinov
fundamental process, these >>> memory spikes make capacity planning difficult. >>> I tried adjusting the following settings, but they did not have any >>> effect on the spikes: >>> • compaction_throughput_mb_per_sec >>> • concurrent_compactors >>> *Questions:* >>> 1. Are there other settings I can tune to reduce memory spikes? >>> 2. Could something else be causing these spikes apart from compaction? >>> >>> Would appreciate any insights on how to smooth out memory usage. >>> >>> - vignesh >>> >> -- Dmitry Konstantinov