I took the reset the world approach, things are much better now and the hints table is staying empty. Bit disconcerting that it could get so large and not be able to recover itself, but at least there was a solution. Thanks
From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Thursday, March 15, 2012 7:24 PM To: user@cassandra.apache.org Subject: Re: Large hints column family These messages make it look like the node is having trouble delivering hints. INFO [HintedHandoff:1] 2012-03-13 16:13:34,188 HintedHandOffManager.java (line 284) Endpoint /192.168.20.4 died before hint delivery, aborting INFO [HintedHandoff:1] 2012-03-13 17:03:50,986 HintedHandOffManager.java (line 354) Timed out replaying hints to /192.168.20.3; aborting further deliveries Take another look at the logs on this machine and on 20.4 and 20.3. I would be looking int why so many hints are been stored. GC ? are there also logs about dropped messages ? If you want to reset the world, make sure the nodes have all run repair and then drop the hints. Either via JMX or stopped in the node and deleting the files on disk. Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 16/03/2012, at 12:58 PM, Bryce Godfrey wrote: We were having some occasional memory pressure issues, but we just added some more RAM a few days ago to the nodes and things are running more smoothly now, but in general nodes have not been going up and down. I tried to do a "list HintsColumnFamily" from Cassandra-cli and it locks my Cassandra node and never returns, forcing me to kill the Cassandra process and restart it to get the node back. Here is my settings which I believe are default since I don't remember changing them: hinted_handoff_enabled: true max_hint_window_in_ms: 3600000 # one hour hinted_handoff_throttle_delay_in_ms: 50 Greping for Hinted in system log I get these INFO [HintedHandoff:1] 2012-03-13 16:13:22,215 HintedHandOffManager.java (line 373) Finished hinted handoff of 852703 rows to endpoint /192.168.20.3 INFO [HintedHandoff:1] 2012-03-13 16:13:34,188 HintedHandOffManager.java (line 284) Endpoint /192.168.20.4 died before hint delivery, aborting INFO [ScheduledTasks:1] 2012-03-13 16:15:32,569 StatusLogger.java (line 65) HintedHandoff 1 1 0 INFO [HintedHandoff:1] 2012-03-13 16:15:44,362 HintedHandOffManager.java (line 296) Started hinted handoff for token: 113427455640312814857969558651062452224 with IP: /192.168.20.3 INFO [HintedHandoff:1] 2012-03-13 16:21:37,266 HintedHandOffManager.java (line 296) Started hinted handoff for token: 113427455640312814857969558651062452224 with IP: /192.168.20.3 INFO [ScheduledTasks:1] 2012-03-13 16:23:07,662 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-13 16:25:49,330 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-13 16:30:52,503 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-13 16:42:22,202 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [HintedHandoff:1] 2012-03-13 17:03:50,986 HintedHandOffManager.java (line 354) Timed out replaying hints to /192.168.20.3; aborting further deliveries INFO [HintedHandoff:1] 2012-03-13 17:03:50,986 ColumnFamilyStore.java (line 704) Enqueuing flush of Memtable-HintsColumnFamily@661547256(34298224/74465815 serialized/live bytes, 78808 ops) INFO [HintedHandoff:1] 2012-03-13 17:11:00,098 HintedHandOffManager.java (line 373) Finished hinted handoff of 44160 rows to endpoint /192.168.20.3 INFO [HintedHandoff:1] 2012-03-13 17:11:36,596 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /192.168.20.4 INFO [ScheduledTasks:1] 2012-03-13 17:12:25,248 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [HintedHandoff:1] 2012-03-13 18:47:56,151 HintedHandOffManager.java (line 296) Started hinted handoff for token: 113427455640312814857969558651062452224 with IP: /192.168.20.3 INFO [ScheduledTasks:1] 2012-03-13 18:50:24,326 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-14 12:12:48,177 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-14 12:13:57,685 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-14 12:14:57,258 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-14 12:14:58,260 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-14 12:15:59,093 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-14 12:16:59,428 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-14 12:18:01,862 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-14 12:18:01,898 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-14 12:19:04,527 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-14 12:19:04,541 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-14 12:20:07,712 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-14 12:20:08,332 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [HintedHandoff:1] 2012-03-14 12:27:13,033 HintedHandOffManager.java (line 296) Started hinted handoff for token: 113427455640312814857969558651062452224 with IP: /192.168.20.3 INFO [ScheduledTasks:1] 2012-03-15 15:05:00,954 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [HintedHandoff:1] 2012-03-15 15:06:07,750 HintedHandOffManager.java (line 354) Timed out replaying hints to /192.168.20.3; aborting further deliveries INFO [ScheduledTasks:1] 2012-03-15 15:06:07,802 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [HintedHandoff:1] 2012-03-15 15:06:07,809 ColumnFamilyStore.java (line 704) Enqueuing flush of Memtable-HintsColumnFamily@254668880(103911/8312880 serialized/live bytes, 63877 ops) INFO [ScheduledTasks:1] 2012-03-15 15:07:13,503 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [HintedHandoff:1] 2012-03-15 15:15:43,842 HintedHandOffManager.java (line 296) Started hinted handoff for token: 113427455640312814857969558651062452224 with IP: /192.168.20.3 From: aaron morton [mailto:aa...@thelastpickle.com]<mailto:[mailto:aa...@thelastpickle.com]> Sent: Thursday, March 15, 2012 1:51 AM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Large hints column family Is there anything going on in the logs ? Are nodes going up and down ? Can you see any messages about delivering hints ? If the query to read the hints errors it will log "HintsCF getEPPendingHints timed out" at INFO level. Also checking, do the hinted_handoff_* settings in cassandra.yaml have their default settings ? Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/03/2012, at 8:35 AM, Bryce Godfrey wrote: Forgot to mention that this is on 1.0.8 From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]<mailto:[mailto:bryce.godf...@azaleos.com]> Sent: Wednesday, March 14, 2012 12:34 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Large hints column family The system HintsColumnFamily seems large in my cluster, and I want to track down why that is. I try invoking "listEndpointsPendingHints()" for o.a.c.db.HintedHandoffManager and it never returns, and also freezes the node that its invoked against. It's a 3 node cluster, and all nodes have been up and running without issue for a while. Any help on where to start with this? Column Family: HintsColumnFamily SSTable count: 11 Space used (live): 11271669539 Space used (total): 11271669539 Number of Keys (estimate): 1408 Memtable Columns Count: 338 Memtable Data Size: 0 Memtable Switch Count: 1 Read Count: 3 Read Latency: 4354.669 ms. Write Count: 848 Write Latency: 0.029 ms. Pending Tasks: 0 Bloom Filter False Postives: 0 Bloom Filter False Ratio: 0.00000 Bloom Filter Space Used: 12656 Key cache capacity: 14 Key cache size: 11 Key cache hit rate: 0.6666666666666666 Row cache: disabled Compacted row minimum size: 105779 Compacted row maximum size: 7152383774 Compacted row mean size: 590818614 Thanks, Bryce