[ 
https://issues.apache.org/jira/browse/CASSANDRA-19776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17954564#comment-17954564
 ] 

Stefan Miklosovic edited comment on CASSANDRA-19776 at 5/28/25 11:51 AM:
-------------------------------------------------------------------------

The "quick fix" approach was already suggested earlier and it looks like this 
(1). Cameron was asking if doing this will prevent expired SSTables to be 
dropped until the end of the compaction. I think that at this point we are more 
knowledgeable and I think that it was actually always the case because if we 
have unreferenced / obsolete SSTables, they will be indeed physically removed 
from disk at the very end of the compaction anyway. 

The "proper way" is in (2), if I understood [~blambov] correctly. If we put 
expired into "transaction.staged.obsolete" set, then in checkpoint() method, 
these SSTableReader's will not be released because they will be present in 
staged.obsolete and we will release only these which are not in that set.
{code:java}
accumulate = release(selfRefs(filterOut(toUpdate, staged.obsolete)), 
accumulate);
{code}
We are eventually releasing obsoleted on committing here (3)

(1) 
[https://github.com/instaclustr/cassandra/commit/1b8992677c1817c2f1e3e802ea25e2d0fc30fa4f]
(2) [https://github.com/apache/cassandra/pull/4183/files]
(3) 
[https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/db/lifecycle/LifecycleTransaction.java#L252]


was (Author: smiklosovic):
The "quick fix" approach was already suggested earlier and it looks like this 
(1). Cameron was asking if doing this will prevent expired SSTables to be 
dropped until the end of the compaction. I think that at this point we are more 
knowledgeable and I think that it was actually always the case because if we 
have unreferenced / obsolete SSTables, they will be indeed physically removed 
from disk at the very end of the compaction anyway. 

The "proper way" is in (2), as I understood [~blambov] correctly. If we put 
expired into "transaction.staged.obsolete" set, then in checkpoint() method, 
these SSTableReader's will not be released because they will be present in 
staged.obsolete and we will release only these which are not in that set. 

{code}
accumulate = release(selfRefs(filterOut(toUpdate, staged.obsolete)), 
accumulate);
{code}

We are eventually releasing obsoleted on committing here (3)

(1) 
[https://github.com/instaclustr/cassandra/commit/1b8992677c1817c2f1e3e802ea25e2d0fc30fa4f]
(2) [https://github.com/apache/cassandra/pull/4183/files]
(3) 
[https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/db/lifecycle/LifecycleTransaction.java#L252]

> Spinning trying to capture readers
> ----------------------------------
>
>                 Key: CASSANDRA-19776
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19776
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Legacy/Core
>            Reporter: Cameron Zemek
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>         Attachments: extract.log
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> On a handful of clusters we are noticing Spin locks occurring. I traced back 
> all the calls to the EstimatedPartitionCount metric (eg. 
> org.apache.cassandra.metrics:type=Table,keyspace=testks,scope=testcf,name=EstimatedPartitionCount)
> Using the following patched function:
> {code:java}
>     public RefViewFragment selectAndReference(Function<View, 
> Iterable<SSTableReader>> filter)
>     {
>         long failingSince = -1L;
>         boolean first = true;
>         while (true)
>         {
>             ViewFragment view = select(filter);
>             Refs<SSTableReader> refs = Refs.tryRef(view.sstables);
>             if (refs != null)
>                 return new RefViewFragment(view.sstables, view.memtables, 
> refs);
>             if (failingSince <= 0)
>             {
>                 failingSince = System.nanoTime();
>             }
>             else if (System.nanoTime() - failingSince > 
> TimeUnit.MILLISECONDS.toNanos(100))
>             {
>                 List<SSTableReader> released = new ArrayList<>();
>                 for (SSTableReader reader : view.sstables)
>                     if (reader.selfRef().globalCount() == 0)
>                         released.add(reader);
>                 NoSpamLogger.log(logger, NoSpamLogger.Level.WARN, 1, 
> TimeUnit.SECONDS,
>                                  "Spinning trying to capture readers {}, 
> released: {}, ", view.sstables, released);
>                 if (first)
>                 {
>                     first = false;
>                     try {
>                         throw new RuntimeException("Spinning trying to 
> capture readers");
>                     } catch (Exception e) {
>                         logger.warn("Spin lock stacktrace", e);
>                     }
>                 }
>                 failingSince = System.nanoTime();
>             }
>         }
>     }
>  {code}
> Digging into this code I found it will fail if any of the sstables are in 
> released state (ie. reader.selfRef().globalCount() == 0).
> See the extract.log for an example of one of these spin lock occurrences. 
> Sometimes these spin locks last over 5 minutes. Across the worst cluster with 
> this issue, I ran a log processing script that everytime the 'Spinning trying 
> to capture readers' was different to previous one it would output if the 
> released tables were in Compacting state. Every single occurrence has it spin 
> locking with released listing a sstable that is compacting.
> In the extract.log example its spin locking saying that nb-320533-big-Data.db 
> has been released. But you can see prior to it spinning that sstable is 
> involved in a compaction. The compaction completes at 01:03:36 and the 
> spinning stops. nb-320533-big-Data.db is deleted at 01:03:49 along with the 
> other 9 sstables involved in the compaction.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to