What I can tell you from that trace - given that this is the correct thread
and it really hangs there:

The validation is stuck when reading from an SSTable.
Unfortunately I am no caffeine expert. It looks like the read is cached and
after the read caffeine tries to drain the cache and this is stuck. I don't
see the reason from that stack trace.
Someone had to dig deeper into caffeine to find the root cause.

2017-04-13 9:27 GMT+02:00 Roland Otta <roland.o...@willhaben.at>:

> i had a closer look at the validation executor thread (i hope thats what
> you meant)
>
> it seems the thread is always repeating stuff in
> org.apache.cassandra.cache.ChunkCache$CachingRebufferer.
> rebuffer(ChunkCache.java:235)
>
> here is the full stack trace ...
>
> i am sorry .. but i have no clue whats happening there ..
>
> com.github.benmanes.caffeine.cache.BoundedLocalCache$$Lambda$64/2098345091
> <(209)%20834-5091>.accept(Unknown Source)
> com.github.benmanes.caffeine.cache.BoundedBuffer$RingBuffer.drainTo(
> BoundedBuffer.java:104)
> com.github.benmanes.caffeine.cache.StripedBuffer.drainTo(
> StripedBuffer.java:160)
> com.github.benmanes.caffeine.cache.BoundedLocalCache.drainReadBuffer(
> BoundedLocalCache.java:964)
> com.github.benmanes.caffeine.cache.BoundedLocalCache.
> maintenance(BoundedLocalCache.java:918)
> com.github.benmanes.caffeine.cache.BoundedLocalCache.performCleanUp(
> BoundedLocalCache.java:903)
> com.github.benmanes.caffeine.cache.BoundedLocalCache$
> PerformCleanupTask.run(BoundedLocalCache.java:2680)
> com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(
> MoreExecutors.java:457)
> com.github.benmanes.caffeine.cache.BoundedLocalCache.scheduleDrainBuffers(
> BoundedLocalCache.java:875)
> com.github.benmanes.caffeine.cache.BoundedLocalCache.
> afterRead(BoundedLocalCache.java:748)
> com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(
> BoundedLocalCache.java:1783)
> com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.
> java:97)
> com.github.benmanes.caffeine.cache.LocalLoadingCache.get(
> LocalLoadingCache.java:66)
> org.apache.cassandra.cache.ChunkCache$CachingRebufferer.
> rebuffer(ChunkCache.java:235)
> org.apache.cassandra.cache.ChunkCache$CachingRebufferer.
> rebuffer(ChunkCache.java:213)
> org.apache.cassandra.io.util.RandomAccessReader.reBufferAt(
> RandomAccessReader.java:65)
> org.apache.cassandra.io.util.RandomAccessReader.reBuffer(
> RandomAccessReader.java:59)
> org.apache.cassandra.io.util.RebufferingInputStream.read(
> RebufferingInputStream.java:88)
> org.apache.cassandra.io.util.RebufferingInputStream.readFully(
> RebufferingInputStream.java:66)
> org.apache.cassandra.io.util.RebufferingInputStream.readFully(
> RebufferingInputStream.java:60)
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:402)
> org.apache.cassandra.db.marshal.AbstractType.readValue(AbstractType.java:
> 420)
> org.apache.cassandra.db.rows.Cell$Serializer.deserialize(Cell.java:245)
> org.apache.cassandra.db.rows.UnfilteredSerializer.readSimpleColumn(
> UnfilteredSerializer.java:610)
> org.apache.cassandra.db.rows.UnfilteredSerializer.lambda$
> deserializeRowBody$1(UnfilteredSerializer.java:575)
> org.apache.cassandra.db.rows.UnfilteredSerializer$$Lambda$84/898489541.accept(Unknown
> Source)
> org.apache.cassandra.utils.btree.BTree.applyForwards(BTree.java:1222)
> org.apache.cassandra.utils.btree.BTree.apply(BTree.java:1177)
> org.apache.cassandra.db.Columns.apply(Columns.java:377)
> org.apache.cassandra.db.rows.UnfilteredSerializer.deserializeRowBody(
> UnfilteredSerializer.java:571)
> org.apache.cassandra.db.rows.UnfilteredSerializer.deserialize(
> UnfilteredSerializer.java:440)
> org.apache.cassandra.io.sstable.SSTableSimpleIterator$
> CurrentFormatIterator.computeNext(SSTableSimpleIterator.java:95)
> org.apache.cassandra.io.sstable.SSTableSimpleIterator$
> CurrentFormatIterator.computeNext(SSTableSimpleIterator.java:73)
> org.apache.cassandra.utils.AbstractIterator.hasNext(
> AbstractIterator.java:47)
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.hasNext(
> SSTableIdentityIterator.java:122)
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRow
> Iterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRow
> Iterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
> org.apache.cassandra.utils.AbstractIterator.hasNext(
> AbstractIterator.java:47)
> org.apache.cassandra.utils.MergeIterator$Candidate.
> advance(MergeIterator.java:374)
> org.apache.cassandra.utils.MergeIterator$ManyToOne.
> advance(MergeIterator.java:186)
> org.apache.cassandra.utils.MergeIterator$ManyToOne.
> computeNext(MergeIterator.java:155)
> org.apache.cassandra.utils.AbstractIterator.hasNext(
> AbstractIterator.java:47)
> org.apache.cassandra.db.rows.UnfilteredRowIterators$
> UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:500)
> org.apache.cassandra.db.rows.UnfilteredRowIterators$
> UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:360)
> org.apache.cassandra.utils.AbstractIterator.hasNext(
> AbstractIterator.java:47)
> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)
> org.apache.cassandra.db.rows.UnfilteredRowIterators.digest(
> UnfilteredRowIterators.java:178)
> org.apache.cassandra.repair.Validator.rowHash(Validator.java:221)
> org.apache.cassandra.repair.Validator.add(Validator.java:160)
> org.apache.cassandra.db.compaction.CompactionManager.
> doValidationCompaction(CompactionManager.java:1364)
> org.apache.cassandra.db.compaction.CompactionManager.
> access$700(CompactionManager.java:85)
> org.apache.cassandra.db.compaction.CompactionManager$
> 13.call(CompactionManager.java:933)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> org.apache.cassandra.concurrent.NamedThreadFactory.
> lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
> org.apache.cassandra.concurrent.NamedThreadFactory$
> $Lambda$5/1371495133.run(Unknown Source)
> java.lang.Thread.run(Thread.java:745)
>
> On Thu, 2017-04-13 at 08:47 +0200, benjamin roth wrote:
>
> You should connect to the node with JConsole and see where the compaction
> thread is stuck
>
> 2017-04-13 8:34 GMT+02:00 Roland Otta <roland.o...@willhaben.at>:
>
> hi,
>
> we have the following issue on our 3.10 development cluster.
>
> we are doing regular repairs with thelastpickle's fork of creaper.
> sometimes the repair (it is a full repair in that case) hangs because
> of a stuck validation compaction
>
> nodetool compactionstats gives me
> a1bb45c0-1fc6-11e7-81de-0fb0b3f5a345 Validation      bds      ad_event
> 805955242 841258085 bytes 95.80%
> we have here no more progress for hours
>
> nodetool tpstats shows
> alidationExecutor                1         1          16186         0
>                0
>
> i checked the logs on the affected node and could not find any
> suspicious errors.
>
> anyone that already had this issue and knows how to cope with that?
>
> a restart of the node helps to finish the repair ... but i am not sure
> whether that somehow breaks the full repair
>
> bg,
> roland
>
>
>

Reply via email to