Certainly makes sense to not allow it. Any idea why the node would be holding hints for tokens that don’t exist?
-Allan On January 20, 2014 at 1:09:51 PM, sankalp kohli (kohlisank...@gmail.com) wrote: Yes as per code you cannot delete hints for endpoints which are not part of the ring. if (!StorageService.instance.getTokenMetadata().isMember(endpoint)) return; On Mon, Jan 20, 2014 at 12:34 PM, Allan C <alla...@gmail.com> wrote: There are 3 other nodes that have a mild case. This is one node is worse by an order of magnitude. deleteHintsForEndpoint fails with the same error on any of the affected nodes. -Allan On January 20, 2014 at 12:24:33 PM, sankalp kohli (kohlisank...@gmail.com) wrote: Is this happening in one node or all. Did you try to delete the hints via JMX in other nodes? On Mon, Jan 20, 2014 at 12:18 PM, Allan C <alla...@gmail.com> wrote: Hi , I’m hitting a very odd issue with HintedHandoff on 1 node in my 12 node cluster running 1.2.13. Somehow it’s holding a large amount of hints for tokens that have never been part of the cluster. Pretty sure this is causing a bunch of memory pressure somehow that’s causing the node to go down. I’d like to find out if I can just reset by deleting the hints CF or if there’s actually important data in there. I’m tempted to clear the CF and hope that fixes it, but a few nodes have been up and down (especially this one) since my last repair and I worry that I won’t be able to get through a full repair given the problems with the node currently. Here’s what I see so far: * listEndpointsPendingHints returns a list of about 20 tokens that are not part of the ring and have never been part of it. I’m not using vnodes, fwiw. deleteHintsForEndpoint doesn’t work. It tells me that the there’s no host for the token. * The hints CF is oddly large: Column Family: hints SSTable count: 260 Space used (live): 124904685 Space used (total): 124904685 SSTable Compression Ratio: 0.394676439667606 Number of Keys (estimate): 66560 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 14 Read Count: 113 Read Latency: 757.123 ms. Write Count: 987 Write Latency: 0.044 ms. Pending Tasks: 0 Bloom Filter False Positives: 10 Bloom Filter False Ratio: 0.00209 Bloom Filter Space Used: 6528 Compacted row minimum size: 36 Compacted row maximum size: 107964792 Compacted row mean size: 787505 Average live cells per slice (last five minutes): 0.0 * I get this assertion in the logs often: ERROR [CompactionExecutor:81] 2014-01-20 12:31:22,652 CassandraDaemon.java (line 191) Exception in thread Thread[CompactionExecutor:81,1,main] java.lang.AssertionError: originally calculated column size of 71868452 but now it is 71869026 at org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:135) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) ERROR [HintedHandoff:52] 2014-01-20 12:31:22,652 CassandraDaemon.java (line 191) Exception in thread Thread[HintedHandoff:52,1,main] java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.AssertionError: originally calculated column size of 71868452 but now it is 71869026 at org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:436) at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:282) at org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:90) at org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:502) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.util.concurrent.ExecutionException: java.lang.AssertionError: originally calculated column size of 71868452 but now it is 71869026 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:432) ... 6 more Caused by: java.lang.AssertionError: originally calculated column size of 71868452 but now it is 71869026 at org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:135) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) ... 3 more * I see a similar error when I try to compact the hints CF, even when I set in_memory_compaction_limit_in_mb as high as 1024. This started after I had brought up a few new nodes last week and then decommissioned them a few days later. The adding and decommissioning appeared to go uneventfully. If anyone has seen anything like this or can give me some hints on how to determine if the hints can be deleted, I’d greatly appreciate it. -Allan