Re: HintedHandoff Exception and node holding hints to random tokens

sankalp kohli Mon, 20 Jan 2014 13:10:59 -0800

Yes as per code you cannot delete hints for endpoints which are not part of
the ring.


 if (!StorageService.instance.getTokenMetadata().isMember(endpoint))
            return;


On Mon, Jan 20, 2014 at 12:34 PM, Allan C <alla...@gmail.com> wrote:

> There are 3 other nodes that have a mild case. This is one node is worse
> by an order of magnitude. deleteHintsForEndpoint fails with the same error
>  on any of the affected nodes.
>
> -Allan
>
> On January 20, 2014 at 12:24:33 PM, sankalp kohli 
> (kohlisank...@gmail.com<//kohlisank...@gmail.com>)
> wrote:
>
> Is this happening in one node or all. Did you try to delete the hints via
> JMX in other nodes?
>
>
> On Mon, Jan 20, 2014 at 12:18 PM, Allan C <alla...@gmail.com> wrote:
>
>>  Hi ,
>>
>> I’m hitting a very odd issue with HintedHandoff on 1 node in my 12 node
>> cluster running 1.2.13. Somehow it’s holding a large amount of hints for
>> tokens that have never been part of the cluster. Pretty sure this is
>> causing a bunch of memory pressure somehow that’s causing the node to go
>> down.
>>
>> I’d like to find out if I can just reset by deleting the hints CF or if
>> there’s actually important data in there. I’m tempted to clear the CF and
>> hope that fixes it, but a few nodes have been up and down (especially this
>> one) since my last repair and I worry that I won’t be able to get through a
>> full repair given the problems with the node currently.
>>
>> Here’s what I see so far:
>>
>>
>> * listEndpointsPendingHints returns a list of about 20 tokens that are
>> not part of the ring and have never been part of it. I’m not using vnodes,
>> fwiw. deleteHintsForEndpoint doesn’t work. It tells me that the there’s no
>> host for the token.
>>
>>
>> * The hints CF is oddly large:
>>
>>       Column Family: hints
>> SSTable count: 260
>> Space used (live): 124904685
>> Space used (total): 124904685
>> SSTable Compression Ratio: 0.394676439667606
>> Number of Keys (estimate): 66560
>> Memtable Columns Count: 0
>> Memtable Data Size: 0
>> Memtable Switch Count: 14
>> Read Count: 113
>> Read Latency: 757.123 ms.
>> Write Count: 987
>> Write Latency: 0.044 ms.
>> Pending Tasks: 0
>> Bloom Filter False Positives: 10
>> Bloom Filter False Ratio: 0.00209
>> Bloom Filter Space Used: 6528
>> Compacted row minimum size: 36
>> Compacted row maximum size: 107964792
>> Compacted row mean size: 787505
>> Average live cells per slice (last five minutes): 0.0
>>
>>
>> * I get this assertion in the logs often:
>>
>>  ERROR [CompactionExecutor:81] 2014-01-20 
>> 12:31:22,652<http://airmail.calendar/2014-01-20%2012:31:22%20PST> 
>> CassandraDaemon.java
>> (line 191) Exception in thread Thread[CompactionExecutor:81,1,main]
>> java.lang.AssertionError: originally calculated column size of 71868452
>> but now it is 71869026
>>         at
>> org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:135)
>>         at
>> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160)
>>         at
>> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
>>         at
>> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>>         at
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>>         at
>> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>>         at
>> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>>         at
>> org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442)
>>         at
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>>         at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>>         at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:662)
>> ERROR [HintedHandoff:52] 2014-01-20 
>> 12:31:22,652<http://airmail.calendar/2014-01-20%2012:31:22%20PST> 
>> CassandraDaemon.java
>> (line 191) Exception in thread Thread[HintedHandoff:52,1,main]
>> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
>> java.lang.AssertionError: originally calculated column size of 71868452 but
>> now it is 71869026
>>         at
>> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:436)
>>         at
>> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:282)
>>         at
>> org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:90)
>>         at
>> org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:502)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:662)
>> Caused by: java.util.concurrent.ExecutionException:
>> java.lang.AssertionError: originally calculated column size of 71868452 but
>> now it is 71869026
>>         at
>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>>         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>>         at
>> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:432)
>>         ... 6 more
>> Caused by: java.lang.AssertionError: originally calculated column size of
>> 71868452 but now it is 71869026
>>         at
>> org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:135)
>>         at
>> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160)
>>         at
>> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
>>         at
>> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>>         at
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>>         at
>> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>>         at
>> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>>         at
>> org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442)
>>         at
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>>         at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>>         at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>         ... 3 more
>>
>>
>> * I see a similar error when I try to compact the hints CF, even when I
>> set in_memory_compaction_limit_in_mb as high as 1024.
>>
>> This started after I had brought up a few new nodes last week and then
>> decommissioned them a few days later. The adding and decommissioning
>> appeared to go uneventfully.
>>
>>
>> If anyone has seen anything like this or can give me some hints on how to
>> determine if the hints can be deleted, I’d greatly appreciate it.
>>
>> -Allan
>>
>
>

Re: HintedHandoff Exception and node holding hints to random tokens

Reply via email to