thThreadPriority(Thread.MIN_PRIORITY)" when using
> executorFactory in
> https://github.com/apache/cassandra/blob/77a3e0e818df3cce45a974ecc977ad61bdcace47/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L2028
> should do it.
>
>
> Did I miss a reason to no longer use low priority threads for compaction ?
> Should I open a bug for re-adding this feature / submit a PR ?
>
> Regards,
>
> Pierre Fersing
>
>
--
Dmitry Konstantinov
efault with the Linux kernel, but
> that has changed since bfq and mq-deadline were added to the Linux kernel.
> Both bfq and mq-deadline supports IO priority, as documented here:
> https://docs.kernel.org/block/ioprio.html
>
>
> On 22/02/2024 19:39, Dmitry Konstantinov wrote:
&
ne 1094
> <https://github.com/apache/cassandra/blob/8d91b469afd3fcafef7ef85c10c8acc11703ba2d/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1094>
> in the same file.
>
> try
> {
> // we wait on the latch for the commitLogUpperBound to be
> set, and so that waiters
> // on this task can rely on all prior flushes being
> complete
> latch.await(); // <--- stuck here
> }
> Our top suspect is CDC interacting with repair, since this started to
> happen shortly after we enabled CDC on the nodes, and each time repair was
> running. But we have not been able to reproduce this in a testing cluster,
> and don't know what's the next step to troubleshoot this issue. So I'm
> posting it in the mailing lists and hoping someone may know something about
> it or point me to the right direction.
>
> p.s.: sorry about posting this to both the user & dev mailing lists. It's
> an end-user related issue but involves Cassandra internals, so I can't
> decide which one is best suited.
>
> Cheers,
> Bowen
>
>
>
--
Dmitry Konstantinov
line 1190
>> <https://github.com/apache/cassandra/blob/8d91b469afd3fcafef7ef85c10c8acc11703ba2d/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1190>
>> in the ColumnFamilyStore.java:
>>
>> // mark writes older than the barrier as blocking progress,
>> permitting them to exceed our memory limit
>> // if they are stuck waiting on it, then wait for them all to
>> complete
>> writeBarrier.markBlocking();
>> writeBarrier.await(); // <--- stuck here
>>
>> And the MemtablePostFlush thread is stuck on line 1094
>> <https://github.com/apache/cassandra/blob/8d91b469afd3fcafef7ef85c10c8acc11703ba2d/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1094>
>> in the same file.
>>
>> try
>> {
>> // we wait on the latch for the commitLogUpperBound to be
>> set, and so that waiters
>> // on this task can rely on all prior flushes being
>> complete
>> latch.await(); // <--- stuck here
>> }
>> Our top suspect is CDC interacting with repair, since this started to
>> happen shortly after we enabled CDC on the nodes, and each time repair was
>> running. But we have not been able to reproduce this in a testing cluster,
>> and don't know what's the next step to troubleshoot this issue. So I'm
>> posting it in the mailing lists and hoping someone may know something about
>> it or point me to the right direction.
>>
>> p.s.: sorry about posting this to both the user & dev mailing lists. It's
>> an end-user related issue but involves Cassandra internals, so I can't
>> decide which one is best suited.
>>
>> Cheers,
>> Bowen
>>
>>
>>
>
> --
> Dmitry Konstantinov
>
>
--
Dmitry Konstantinov
al#
*state*
org.apache.cassandra.utils.concurrent.WaitQueue.Standard.RegisteredSignal#
*thread*
in the object here?:
"read-hotness-tracker:1" daemon prio=5 tid=93 WAITING
at
org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:289)
Regards,
Dmitry
On Thu, 7 Nov 2024 at 21:30, Dmitry Konstantinov
ere
> }
> Our top suspect is CDC interacting with repair, since this started to
> happen shortly after we enabled CDC on the nodes, and each time repair was
> running. But we have not been able to reproduce this in a testing cluster,
> and don't know what's the next step to troubleshoot this issue. So I'm
> posting it in the mailing lists and hoping someone may know something about
> it or point me to the right direction.
>
>
> Wouldn’t be completely surprised if CDC or repair somehow has a barrier,
> I’ve also seen similar behavior pre-3.0 with “very long running read
> commands” that have a barrier on the memtable that prevent release.
>
> You’ve got the heap (great, way better than most people debugging), are
> you able to navigate through it and look for references to that memtable or
> other things holding a barrier?
>
>
>
>
>
--
Dmitry Konstantinov
audit logs to be seen. After cassandra restart it
>> seems to be disabled again.
>> Anyone also came across that?
>>
>> Thx,
>> Sebastian.
>>
>
--
Dmitry Konstantinov
fundamental process, these
>>> memory spikes make capacity planning difficult.
>>> I tried adjusting the following settings, but they did not have any
>>> effect on the spikes:
>>> • compaction_throughput_mb_per_sec
>>> • concurrent_compactors
>>> *Questions:*
>>> 1. Are there other settings I can tune to reduce memory spikes?
>>> 2. Could something else be causing these spikes apart from compaction?
>>>
>>> Would appreciate any insights on how to smooth out memory usage.
>>>
>>> - vignesh
>>>
>>
--
Dmitry Konstantinov