Bill Kuhhirte created CASSANDRA-18388:
-----------------------------------------

             Summary: Potential loop in RebufferingInputStream.read()
                 Key: CASSANDRA-18388
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18388
             Project: Cassandra
          Issue Type: Bug
            Reporter: Bill Kuhhirte


We are running a 3-node cluster of Cassandra 3.11.4 nodes and use the Stratio 
Lucene secondary index.  As part of a maintenance cycle, we may rebuild the 
secondary index on discrete nodes using `nodetool rebuild_index` and for the 
most part this runs without issues.  Recently we had a problem where the 
compaction job for the index rebuild stalled and refused to accept the stop 
commands from `nodetool stop`.  When looking at the stack trace for C*, we saw 
the following:


{code:java}
"CompactionExecutor:3" #404 daemon prio=1 os_prio=4 tid=0x00007f4be4f74670 
nid=0x7b6f runnable [0x00007f45c6c97000]
   java.lang.Thread.State: RUNNABLE
        at 
org.apache.cassandra.io.util.RebufferingInputStream.read(RebufferingInputStream.java:98)
        at 
org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:66)
        at 
org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:60)
        at 
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:402)
        at 
org.apache.cassandra.db.marshal.AbstractType.readValue(AbstractType.java:434)
        at 
org.apache.cassandra.db.ClusteringPrefix$Serializer.deserializeValuesWithoutSize(ClusteringPrefix.java:359)
        at 
org.apache.cassandra.db.Clustering$Serializer.deserialize(Clustering.java:163)
        at 
org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:285)
        at 
org.apache.cassandra.db.Serializers$NewFormatSerializer.deserialize(Serializers.java:178)
        at 
org.apache.cassandra.db.Serializers$NewFormatSerializer.deserialize(Serializers.java:154)
        at 
org.apache.cassandra.io.sstable.IndexInfo$Serializer.deserialize(IndexInfo.java:133)
        at 
org.apache.cassandra.io.sstable.IndexInfo$Serializer.deserialize(IndexInfo.java:82)
 
        at 
org.apache.cassandra.db.RowIndexEntry$ShallowInfoRetriever.fetchIndex(RowIndexEntry.java:977)
        at 
org.apache.cassandra.db.RowIndexEntry$FileIndexInfoRetriever.columnsIndex(RowIndexEntry.java:1017)
        at 
org.apache.cassandra.db.columniterator.AbstractSSTableIterator$IndexState.index(AbstractSSTableIterator.java:578)
        at 
org.apache.cassandra.db.columniterator.AbstractSSTableIterator$IndexState.binarySearch(AbstractSSTableIterator.java:636)
        at 
org.apache.cassandra.db.columniterator.AbstractSSTableIterator$IndexState.indexFor(AbstractSSTableIterator.java:627)
        at 
org.apache.cassandra.db.columniterator.AbstractSSTableIterator$IndexState.findBlockIndex(AbstractSSTableIterator.java:590)
        at 
org.apache.cassandra.db.columniterator.SSTableIterator$ForwardIndexedReader.setForSlice(SSTableIterator.java:241)
        at 
org.apache.cassandra.db.columniterator.AbstractSSTableIterator.slice(AbstractSSTableIterator.java:279)
        at 
org.apache.cassandra.db.columniterator.AbstractSSTableIterator.hasNext(AbstractSSTableIterator.java:264)
        at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)
        at 
org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)
        at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:119)
        at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:48)
        at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
        at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
        at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
        at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
        at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
        at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:525)
        at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:385)
        at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
        at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)
        at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:39)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:932)
        at 
com.stratio.cassandra.lucene.IndexWriterWide.commit(IndexWriterWide.scala:85)
        at 
com.stratio.cassandra.lucene.IndexWriter.$anonfun$finish$1(IndexWriter.scala:121)
        at 
com.stratio.cassandra.lucene.IndexWriter$$Lambda$500/362304413.apply$mcV$sp(Unknown
 Source)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
        at com.stratio.cassandra.lucene.util.Locker.run(Locker.scala:32)
        - locked <0x000000045e809558> (a java.lang.Object)
        at 
com.stratio.cassandra.lucene.IndexWriter.finish(IndexWriter.scala:121)
        at 
org.apache.cassandra.index.SecondaryIndexManager$$Lambda$499/1551142940.accept(Unknown
 Source)
        at java.lang.Iterable.forEach(Iterable.java:75)
        at 
org.apache.cassandra.index.SecondaryIndexManager.indexPartition(SecondaryIndexManager.java:625)
        at 
org.apache.cassandra.index.SecondaryIndexManager.indexPartition(SecondaryIndexManager.java:625)
        at 
org.apache.cassandra.index.internal.CollatedViewIndexBuilder.build(CollatedViewIndexBuilder.java:71)
        at 
org.apache.cassandra.db.compaction.CompactionManager$14.run(CompactionManager.java:1637)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
        at 
org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$14/1431530910.run(Unknown
 Source)
        at java.lang.Thread.run(Thread.java:748)
{code}
I took three different jstack dumps over ~1 hour and it stayed at the same spot 
(the end of the while loop in `RebufferingInputStream.read()` The whole 
compaction was stalled for > 6 hours with no movement in the bytes completed.  
With nodes holding only 23G total, that seems unlikely that it would take that 
much time to complete on its own.  

The code for RebufferingInputStream doesn't appear to be substantially changed 
and unless I am reading things wrong, it seems like the value for `remaining` 
must still be zero when calculating `toCopy` to not be able to break out from 
the while loop.  

Can anyone provide suggestions as to why that might be?  Doing `sstabledump`s 
on the local SSTables seems to indicate they are in good shape and takes ~40 
seconds each to read them fully.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to