Re: HintedHandoff Exception and node holding hints to random tokens

Allan C Mon, 20 Jan 2014 13:26:21 -0800

 Certainly makes sense to not allow it. Any idea why the node would be holding 
hints for tokens that don’t exist?


-Allan

On January 20, 2014 at 1:09:51 PM, sankalp kohli (kohlisank...@gmail.com) wrote:

Yes as per code you cannot delete hints for endpoints which are not part of the 
ring. 

 if (!StorageService.instance.getTokenMetadata().isMember(endpoint))
            return;


On Mon, Jan 20, 2014 at 12:34 PM, Allan C <alla...@gmail.com> wrote:
There are 3 other nodes that have a mild case. This is one node is worse by an 
order of magnitude. deleteHintsForEndpoint fails with the same error  on any of 
the affected nodes.

-Allan

On January 20, 2014 at 12:24:33 PM, sankalp kohli (kohlisank...@gmail.com) 
wrote:

Is this happening in one node or all. Did you try to delete the hints via JMX 
in other nodes? 


On Mon, Jan 20, 2014 at 12:18 PM, Allan C <alla...@gmail.com> wrote:
Hi ,

I’m hitting a very odd issue with HintedHandoff on 1 node in my 12 node cluster 
running 1.2.13. Somehow it’s holding a large amount of hints for tokens that 
have never been part of the cluster. Pretty sure this is causing a bunch of 
memory pressure somehow that’s causing the node to go down.

I’d like to find out if I can just reset by deleting the hints CF or if there’s 
actually important data in there. I’m tempted to clear the CF and hope that 
fixes it, but a few nodes have been up and down (especially this one) since my 
last repair and I worry that I won’t be able to get through a full repair given 
the problems with the node currently.

Here’s what I see so far:


* listEndpointsPendingHints returns a list of about 20 tokens that are not part 
of the ring and have never been part of it. I’m not using vnodes, fwiw. 
deleteHintsForEndpoint doesn’t work. It tells me that the there’s no host for 
the token.


* The hints CF is oddly large:

     Column Family: hints
SSTable count: 260
Space used (live): 124904685
Space used (total): 124904685
SSTable Compression Ratio: 0.394676439667606
Number of Keys (estimate): 66560
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 14
Read Count: 113
Read Latency: 757.123 ms.
Write Count: 987
Write Latency: 0.044 ms.
Pending Tasks: 0
Bloom Filter False Positives: 10
Bloom Filter False Ratio: 0.00209
Bloom Filter Space Used: 6528
Compacted row minimum size: 36
Compacted row maximum size: 107964792
Compacted row mean size: 787505
Average live cells per slice (last five minutes): 0.0


* I get this assertion in the logs often:

ERROR [CompactionExecutor:81] 2014-01-20 12:31:22,652 CassandraDaemon.java 
(line 191) Exception in thread Thread[CompactionExecutor:81,1,main]
java.lang.AssertionError: originally calculated column size of 71868452 but now 
it is 71869026
        at 
org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:135)
        at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160)
        at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
        at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
        at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
        at 
org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
ERROR [HintedHandoff:52] 2014-01-20 12:31:22,652 CassandraDaemon.java (line 
191) Exception in thread Thread[HintedHandoff:52,1,main]
java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.lang.AssertionError: originally calculated column size of 71868452 but now 
it is 71869026
        at 
org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:436)
        at 
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:282)
        at 
org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:90)
        at 
org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:502)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.util.concurrent.ExecutionException: java.lang.AssertionError: 
originally calculated column size of 71868452 but now it is 71869026
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at 
org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:432)
        ... 6 more
Caused by: java.lang.AssertionError: originally calculated column size of 
71868452 but now it is 71869026
        at 
org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:135)
        at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160)
        at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
        at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
        at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
        at 
org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        ... 3 more


* I see a similar error when I try to compact the hints CF, even when I set 
in_memory_compaction_limit_in_mb as high as 1024.

This started after I had brought up a few new nodes last week and then 
decommissioned them a few days later. The adding and decommissioning appeared 
to go uneventfully.


If anyone has seen anything like this or can give me some hints on how to 
determine if the hints can be deleted, I’d greatly appreciate it.

-Allan

Re: HintedHandoff Exception and node holding hints to random tokens

Reply via email to