It happened again today and I had a bit more time to probe stuff. It seems
all non periodic tasks execute on a single thread. so if one thread where
to get stuck work would simply pile up until out of memory, i did a series
of stack dumps and it always seemed to look something like this

"NonPeriodicTasks:1" #103 daemon prio=5 os_prio=0 tid=0x00007febe8342400
> nid=0x4103 runnable [0x00007febc78ed000]
>    java.lang.Thread.State: RUNNABLE
> at$7.computeNext(
> at
> at
> at
> com.github.benmanes.caffeine.cache.LocalCache.invalidateAll(
> at
> com.github.benmanes.caffeine.cache.LocalManualCache.invalidateAll(
> at
> org.apache.cassandra.cache.ChunkCache.invalidateFile(
> at
> at
> Source)
> at java.util.Optional.ifPresent(
> at
> at
> org.apache.cassandra.utils.concurrent.Ref$GlobalState.release(
> at
> org.apache.cassandra.utils.concurrent.Ref$State.ensureReleased(
> at org.apache.cassandra.utils.concurrent.Ref.ensureReleased(
> at
> org.apache.cassandra.utils.concurrent.SharedCloseableImpl.close(
> at

And the thread executing these tasks would always be at 100% cpu

One would expect that invalidating a local cache would be cheap operation.
Yet its not, what could cause chunk cache invalidation to be slow ?
Cassandra does seem to be using an old version of caffeine and there have
been issues <> with it in
the past where it would go into an endless loop under the wrong set of

On Mon, 3 Aug 2020 at 13:52, jelmer <> wrote:

> It did look like there where repairs running at the time. The
> LiveSSTableCount for the entire node is about 2200 tables, for the keyspace
> that was being repaired its just 150
> We run cassandra 3.11.6 so we should be unaffected by  cassandra-14096
> We use for the repairs
> On Sat, 1 Aug 2020 at 01:49, Erick Ramirez <>
> wrote:
>> I don't have specific experience relating to InstanceTidier but when I
>> saw this, I immediately thought of repairs blowing up the heap. 40K
>> instances indicates to me that you have thousands of SSTables -- are they
>> tiny (like 1MB or less)? Otherwise, are they dense nodes (~1TB or more)?
>> How do you run repairs? I'm wondering if it's possible that there are
>> multiple repairs running in parallel like a cron job kicking in while the
>> previous repair is still running.
>> You didn't specify your C* version but my guess is that it's pre-3.11.5.
>> FWIW the repair issue I'm referring to is CASSANDRA-14096 [1].
>> [1]

Reply via email to