[
https://issues.apache.org/jira/browse/CASSANDRA-18388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706425#comment-17706425
]
Brandon Williams commented on CASSANDRA-18388:
----------------------------------------------
I was unable to find any changes that seem like they might affect this and
don't know of any similar reports, but 3.11.4 is four years old so I would try
reproducing on the latest release and see if it persists.
> Potential loop in RebufferingInputStream.read()
> -----------------------------------------------
>
> Key: CASSANDRA-18388
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18388
> Project: Cassandra
> Issue Type: Bug
> Reporter: Bill Kuhhirte
> Priority: Normal
>
> We are running a 3-node cluster of Cassandra 3.11.4 nodes and use the Stratio
> Lucene secondary index. As part of a maintenance cycle, we may rebuild the
> secondary index on discrete nodes using `nodetool rebuild_index` and for the
> most part this runs without issues. Recently we had a problem where the
> compaction job for the index rebuild stalled and refused to accept the stop
> commands from `nodetool stop`. When looking at the stack trace for C*, we
> saw the following:
> {code:java}
> "CompactionExecutor:3" #404 daemon prio=1 os_prio=4 tid=0x00007f4be4f74670
> nid=0x7b6f runnable [0x00007f45c6c97000]
> java.lang.Thread.State: RUNNABLE
> at
> org.apache.cassandra.io.util.RebufferingInputStream.read(RebufferingInputStream.java:98)
> at
> org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:66)
> at
> org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:60)
> at
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:402)
> at
> org.apache.cassandra.db.marshal.AbstractType.readValue(AbstractType.java:434)
> at
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserializeValuesWithoutSize(ClusteringPrefix.java:359)
> at
> org.apache.cassandra.db.Clustering$Serializer.deserialize(Clustering.java:163)
> at
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:285)
> at
> org.apache.cassandra.db.Serializers$NewFormatSerializer.deserialize(Serializers.java:178)
> at
> org.apache.cassandra.db.Serializers$NewFormatSerializer.deserialize(Serializers.java:154)
> at
> org.apache.cassandra.io.sstable.IndexInfo$Serializer.deserialize(IndexInfo.java:133)
> at
> org.apache.cassandra.io.sstable.IndexInfo$Serializer.deserialize(IndexInfo.java:82)
>
> at
> org.apache.cassandra.db.RowIndexEntry$ShallowInfoRetriever.fetchIndex(RowIndexEntry.java:977)
> at
> org.apache.cassandra.db.RowIndexEntry$FileIndexInfoRetriever.columnsIndex(RowIndexEntry.java:1017)
> at
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator$IndexState.index(AbstractSSTableIterator.java:578)
> at
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator$IndexState.binarySearch(AbstractSSTableIterator.java:636)
> at
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator$IndexState.indexFor(AbstractSSTableIterator.java:627)
> at
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator$IndexState.findBlockIndex(AbstractSSTableIterator.java:590)
> at
> org.apache.cassandra.db.columniterator.SSTableIterator$ForwardIndexedReader.setForSlice(SSTableIterator.java:241)
> at
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator.slice(AbstractSSTableIterator.java:279)
> at
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator.hasNext(AbstractSSTableIterator.java:264)
> at
> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)
> at
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:119)
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:48)
> at
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
> at
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
> at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
> at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
> at
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
> at
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:525)
> at
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:385)
> at
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
> at
> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)
> at
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:39)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:932)
> at
> com.stratio.cassandra.lucene.IndexWriterWide.commit(IndexWriterWide.scala:85)
> at
> com.stratio.cassandra.lucene.IndexWriter.$anonfun$finish$1(IndexWriter.scala:121)
> at
> com.stratio.cassandra.lucene.IndexWriter$$Lambda$500/362304413.apply$mcV$sp(Unknown
> Source)
> at
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
> at com.stratio.cassandra.lucene.util.Locker.run(Locker.scala:32)
> - locked <0x000000045e809558> (a java.lang.Object)
> at
> com.stratio.cassandra.lucene.IndexWriter.finish(IndexWriter.scala:121)
> at
> org.apache.cassandra.index.SecondaryIndexManager$$Lambda$499/1551142940.accept(Unknown
> Source)
> at java.lang.Iterable.forEach(Iterable.java:75)
> at
> org.apache.cassandra.index.SecondaryIndexManager.indexPartition(SecondaryIndexManager.java:625)
> at
> org.apache.cassandra.index.SecondaryIndexManager.indexPartition(SecondaryIndexManager.java:625)
> at
> org.apache.cassandra.index.internal.CollatedViewIndexBuilder.build(CollatedViewIndexBuilder.java:71)
> at
> org.apache.cassandra.db.compaction.CompactionManager$14.run(CompactionManager.java:1637)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
> at
> org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$14/1431530910.run(Unknown
> Source)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> I took three different jstack dumps over ~1 hour and it stayed at the same
> spot (the end of the while loop in `RebufferingInputStream.read()` The whole
> compaction was stalled for > 6 hours with no movement in the bytes completed.
> With nodes holding only 23G total, that seems unlikely that it would take
> that much time to complete on its own.
> The code for RebufferingInputStream doesn't appear to be substantially
> changed and unless I am reading things wrong, it seems like the value for
> `remaining` must still be zero when calculating `toCopy` to not be able to
> break out from the while loop.
> Can anyone provide suggestions as to why that might be? Doing `sstabledump`s
> on the local SSTables seems to indicate they are in good shape and takes ~40
> seconds each to read them fully.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]