Caches might be it. Will try reducing and see how it goes. Didn't mention this because I wasn't seeing the same errors yesterday, but last night thought it might be worth mentioning.
Had changed the location of some data last week, a few days subsequent to that I received some IOException errors for permissions issues. When that occurred the high CPU usage happened almost simultaneously. Fixed the permissions issues and restarted. After that I would run into the high CPU situation once every 6-24 hours, but not see the IOExceptions any more. A restart of the node would fix temporarily. Yesterday, I did a scrub on each of the nodes before restarting. So far the issue hasn't returned. Probably a coincidence, but thought I'd mention anyway. On 2012-01-25, at 7:49 PM, aaron morton wrote: > You are running into GC issues. > >>> WARN [ScheduledTasks:1] 2012-01-22 12:53:42,804 GCInspector.java (line 146) >>> Heap is 0.7767292149986439 full. You may need to reduce memtable and/or >>> cache sizes. Cassandra will now flush up to the two largest memtables to >>> free up memory. Adjust flush_largest_memtables_at threshold in >>> cassandra.yaml if you don't want Cassandra to do this automatically > > Can you reduce the size of the caches ? > > As you are under low load, does it correlate with compaction or repair > processes ? Check node tool compactioninfo > > Do you have wide rows ? Checlk the max row size with nodetool cfstats. > > Also, if you have made any changes to the default memory and gc settings try > reverting them. > > > Hope that helps. > > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 26/01/2012, at 5:24 AM, Vitalii Tymchyshyn wrote: > >> According to the log, I don't see much time spent for GC. You can still >> check it with jstat or uncomment GC logging in cassandra-env.sh. Are you >> sure you've identified the thread correctly? >> It's still possible that you have memory spike where GCInspector simply has >> no chance to run between Full GC rounds. Checking with jstat or adding GC >> logging may help to diagnose. >> >> 25.01.12 17:24, Matthew Trinneer написав(ла): >>> Here is a snippet of what I'm getting out of system.log for GC. Anything >>> is there provide a clue? >>> >>> WARN [ScheduledTasks:1] 2012-01-22 12:53:42,804 GCInspector.java (line >>> 146) Heap is 0.7767292149986439 full. You may need to reduce memtable >>> and/or cache sizes. Cassandra will now flush up to the two largest >>> memtables to free up memory. Adjust flush_largest_memtables_at threshold >>> in cassandra.yaml if you don't want Cassandra to do this automatically >>> INFO [ScheduledTasks:1] 2012-01-22 12:54:57,685 GCInspector.java (line >>> 123) GC for ConcurrentMarkSweep: 240 ms for 1 collections, 111478936 used; >>> max is 8547991552 >>> INFO [ScheduledTasks:1] 2012-01-22 15:12:21,710 GCInspector.java (line >>> 123) GC for ConcurrentMarkSweep: 1141 ms for 1 collections, 167667688 used; >>> max is 8547991552 >>> INFO [ScheduledTasks:1] 2012-01-23 14:20:32,862 GCInspector.java (line >>> 123) GC for ParNew: 205 ms for 1 collections, 2894546328 used; max is >>> 8547991552 >>> INFO [ScheduledTasks:1] 2012-01-23 20:25:06,541 GCInspector.java (line >>> 123) GC for ParNew: 240 ms for 1 collections, 4602331064 used; max is >>> 8547991552 >>> INFO [ScheduledTasks:1] 2012-01-24 13:24:57,473 GCInspector.java (line >>> 123) GC for ConcurrentMarkSweep: 27869 ms for 1 collections, 6376733632 >>> used; max is 8547991552 >>> INFO [ScheduledTasks:1] 2012-01-24 13:25:24,879 GCInspector.java (line >>> 123) GC for ConcurrentMarkSweep: 26306 ms for 1 collections, 6392079368 >>> used; max is 8547991552 >>> INFO [ScheduledTasks:1] 2012-01-24 13:27:12,991 GCInspector.java (line >>> 123) GC for ConcurrentMarkSweep: 238 ms for 1 collections, 131710776 used; >>> max is 8547991552 >>> INFO [ScheduledTasks:1] 2012-01-24 13:55:48,326 GCInspector.java (line >>> 123) GC for ConcurrentMarkSweep: 609 ms for 1 collections, 50380160 used; >>> max is 8547991552 >>> INFO [ScheduledTasks:1] 2012-01-24 14:34:41,392 GCInspector.java (line >>> 123) GC for ParNew: 325 ms for 1 collections, 1340375240 used; max is >>> 8547991552 >>> INFO [ScheduledTasks:1] 2012-01-24 20:55:19,636 GCInspector.java (line >>> 123) GC for ParNew: 233 ms for 1 collections, 6387236992 used; max is >>> 8547991552 >>> INFO [ScheduledTasks:1] 2012-01-25 14:43:28,921 GCInspector.java (line >>> 123) GC for ParNew: 337 ms for 1 collections, 7031219304 used; max is >>> 8547991552 >>> INFO [ScheduledTasks:1] 2012-01-25 14:43:51,043 GCInspector.java (line >>> 123) GC for ParNew: 211 ms for 1 collections, 7025723712 used; max is >>> 8547991552 >>> INFO [ScheduledTasks:1] 2012-01-25 14:50:00,012 GCInspector.java (line >>> 123) GC for ConcurrentMarkSweep: 51534 ms for 2 collections, 6844998736 >>> used; max is 8547991552 >>> INFO [ScheduledTasks:1] 2012-01-25 14:51:22,249 GCInspector.java (line >>> 123) GC for ConcurrentMarkSweep: 250 ms for 1 collections, 154848440 used; >>> max is 8547991552 >>> INFO [ScheduledTasks:1] 2012-01-25 14:57:46,519 GCInspector.java (line >>> 123) GC for ParNew: 244 ms for 1 collections, 190838344 used; max is >>> 8547991552 >>> INFO [ScheduledTasks:1] 2012-01-25 15:00:21,693 GCInspector.java (line >>> 123) GC for ConcurrentMarkSweep: 389 ms for 1 collections, 28748448 used; >>> max is 8547991552 >>> >>> >>> >>> On 2012-01-25, at 10:09 AM, Vitalii Tymchyshyn wrote: >>> >>>> Hello. >>>> >>>> What's in the logs? It should output something like "Hey, you've got most >>>> of your memory used. I am going to flush some of memtables". Sorry, I >>>> don't remember exact spelling, but it's gong from GC, so it should be >>>> greppable by "GC". >>>> >>>> 25.01.12 16:26, Matthew Trinneer написав(ла): >>>>> Hello Community, >>>>> >>>>> Am troubleshooting an issue with sudden and sustained high CPU on nodes >>>>> in a 3 node cluster. This takes place when there is minimal load on the >>>>> servers, and continues indefinitely until I stop and restart a node. All >>>>> nodes (3) seem to be effected by the same issue, however it doesn't occur >>>>> simultaneous on them (although all potentially could be effected at the >>>>> same time). >>>>> >>>>> Here's what I've been doing to troubleshoot. >>>>> >>>>> * vmstat - have run on all servers and there is *no* swap going on >>>>> >>>>> * iostat - have run on all servers and there is no indication that the >>>>> disk is in anyway a bottleneck >>>>> >>>>> * nodetool - nothing is backed up. Not using all available heap (about >>>>> 2/3rds). >>>>> >>>>> * Have tracked it down to a specific thread (top -H). When I find that >>>>> thread's hex equivalent in a jstack dump I see the following >>>>> >>>>> "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x0000000001d48800 >>>>> nid=0x4534 runnable >>>>> >>>>> >>>>> Which makes me think that perhaps it's something to do with GC >>>>> configuration. Unfortunately I'm not able to determine why this might >>>>> be happening. Any suggestions on how to continue troubleshooting? >>>>> >>>>> btw - here's my jvm (just in case) >>>>> >>>>> java version "1.6.0_22" >>>>> Java(TM) SE Runtime Environment (build 1.6.0_22-b04) >>>>> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode) >>>>> >>>>> >>>>> Thanks! >>>>> >>>>> Matt >>>>> >> >