[ https://issues.apache.org/jira/browse/CASSANDRA-19776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950891#comment-17950891 ]
Stefan Miklosovic commented on CASSANDRA-19776: ----------------------------------------------- _It's not compaction's job here to ensure that there is a live reference to the expired sstables for other uses._ I agree. That was just me describing what I see, not judging it is either good or bad. Referencing it all, even expired, while in compaction, was just a cheap / fast way to ensure that when calling that metric method it will not spin. > Spinning trying to capture readers > ---------------------------------- > > Key: CASSANDRA-19776 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19776 > Project: Apache Cassandra > Issue Type: Bug > Components: Legacy/Core > Reporter: Cameron Zemek > Assignee: Stefan Miklosovic > Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > Attachments: extract.log > > Time Spent: 10m > Remaining Estimate: 0h > > On a handful of clusters we are noticing Spin locks occurring. I traced back > all the calls to the EstimatedPartitionCount metric (eg. > org.apache.cassandra.metrics:type=Table,keyspace=testks,scope=testcf,name=EstimatedPartitionCount) > Using the following patched function: > {code:java} > public RefViewFragment selectAndReference(Function<View, > Iterable<SSTableReader>> filter) > { > long failingSince = -1L; > boolean first = true; > while (true) > { > ViewFragment view = select(filter); > Refs<SSTableReader> refs = Refs.tryRef(view.sstables); > if (refs != null) > return new RefViewFragment(view.sstables, view.memtables, > refs); > if (failingSince <= 0) > { > failingSince = System.nanoTime(); > } > else if (System.nanoTime() - failingSince > > TimeUnit.MILLISECONDS.toNanos(100)) > { > List<SSTableReader> released = new ArrayList<>(); > for (SSTableReader reader : view.sstables) > if (reader.selfRef().globalCount() == 0) > released.add(reader); > NoSpamLogger.log(logger, NoSpamLogger.Level.WARN, 1, > TimeUnit.SECONDS, > "Spinning trying to capture readers {}, > released: {}, ", view.sstables, released); > if (first) > { > first = false; > try { > throw new RuntimeException("Spinning trying to > capture readers"); > } catch (Exception e) { > logger.warn("Spin lock stacktrace", e); > } > } > failingSince = System.nanoTime(); > } > } > } > {code} > Digging into this code I found it will fail if any of the sstables are in > released state (ie. reader.selfRef().globalCount() == 0). > See the extract.log for an example of one of these spin lock occurrences. > Sometimes these spin locks last over 5 minutes. Across the worst cluster with > this issue, I ran a log processing script that everytime the 'Spinning trying > to capture readers' was different to previous one it would output if the > released tables were in Compacting state. Every single occurrence has it spin > locking with released listing a sstable that is compacting. > In the extract.log example its spin locking saying that nb-320533-big-Data.db > has been released. But you can see prior to it spinning that sstable is > involved in a compaction. The compaction completes at 01:03:36 and the > spinning stops. nb-320533-big-Data.db is deleted at 01:03:49 along with the > other 9 sstables involved in the compaction. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org