Is this happening in one node or all. Did you try to delete the hints via
JMX in other nodes?


On Mon, Jan 20, 2014 at 12:18 PM, Allan C <alla...@gmail.com> wrote:

> Hi ,
>
> I’m hitting a very odd issue with HintedHandoff on 1 node in my 12 node
> cluster running 1.2.13. Somehow it’s holding a large amount of hints for
> tokens that have never been part of the cluster. Pretty sure this is
> causing a bunch of memory pressure somehow that’s causing the node to go
> down.
>
> I’d like to find out if I can just reset by deleting the hints CF or if
> there’s actually important data in there. I’m tempted to clear the CF and
> hope that fixes it, but a few nodes have been up and down (especially this
> one) since my last repair and I worry that I won’t be able to get through a
> full repair given the problems with the node currently.
>
> Here’s what I see so far:
>
>
> * listEndpointsPendingHints returns a list of about 20 tokens that are not
> part of the ring and have never been part of it. I’m not using vnodes,
> fwiw. deleteHintsForEndpoint doesn’t work. It tells me that the there’s no
> host for the token.
>
>
> * The hints CF is oddly large:
>
>      Column Family: hints
> SSTable count: 260
> Space used (live): 124904685
> Space used (total): 124904685
> SSTable Compression Ratio: 0.394676439667606
> Number of Keys (estimate): 66560
> Memtable Columns Count: 0
> Memtable Data Size: 0
> Memtable Switch Count: 14
> Read Count: 113
> Read Latency: 757.123 ms.
> Write Count: 987
> Write Latency: 0.044 ms.
> Pending Tasks: 0
> Bloom Filter False Positives: 10
> Bloom Filter False Ratio: 0.00209
> Bloom Filter Space Used: 6528
> Compacted row minimum size: 36
> Compacted row maximum size: 107964792
> Compacted row mean size: 787505
> Average live cells per slice (last five minutes): 0.0
>
>
> * I get this assertion in the logs often:
>
> ERROR [CompactionExecutor:81] 2014-01-20 
> 12:31:22,652<http://airmail.calendar/2014-01-20%2012:31:22%20PST> 
> CassandraDaemon.java
> (line 191) Exception in thread Thread[CompactionExecutor:81,1,main]
> java.lang.AssertionError: originally calculated column size of 71868452
> but now it is 71869026
>         at
> org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:135)
>         at
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160)
>         at
> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
>         at
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>         at
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>         at
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>         at
> org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442)
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ERROR [HintedHandoff:52] 2014-01-20 
> 12:31:22,652<http://airmail.calendar/2014-01-20%2012:31:22%20PST> 
> CassandraDaemon.java
> (line 191) Exception in thread Thread[HintedHandoff:52,1,main]
> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> java.lang.AssertionError: originally calculated column size of 71868452 but
> now it is 71869026
>         at
> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:436)
>         at
> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:282)
>         at
> org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:90)
>         at
> org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:502)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.util.concurrent.ExecutionException:
> java.lang.AssertionError: originally calculated column size of 71868452 but
> now it is 71869026
>         at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>         at
> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:432)
>         ... 6 more
> Caused by: java.lang.AssertionError: originally calculated column size of
> 71868452 but now it is 71869026
>         at
> org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:135)
>         at
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160)
>         at
> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
>         at
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>         at
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>         at
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>         at
> org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442)
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         ... 3 more
>
>
> * I see a similar error when I try to compact the hints CF, even when I
> set in_memory_compaction_limit_in_mb as high as 1024.
>
> This started after I had brought up a few new nodes last week and then
> decommissioned them a few days later. The adding and decommissioning
> appeared to go uneventfully.
>
>
> If anyone has seen anything like this or can give me some hints on how to
> determine if the hints can be deleted, I’d greatly appreciate it.
>
> -Allan
>

Reply via email to