Caches might be it.  Will try reducing and see how it goes.  

Didn't mention this because I wasn't seeing the same errors yesterday, but last 
night thought it might be worth mentioning.

Had changed the location of some data last week, a few days subsequent to that 
I received some IOException errors for permissions issues.  When that occurred 
the high CPU usage happened almost simultaneously.  Fixed the permissions 
issues and restarted.  After that I would run into the high CPU  situation once 
every 6-24 hours, but not see the IOExceptions any more.  A restart of the node 
would fix temporarily.  

Yesterday, I did a scrub on each of the nodes before restarting.  So far the 
issue hasn't returned.   Probably a coincidence, but thought I'd mention anyway.


On 2012-01-25, at 7:49 PM, aaron morton wrote:

> You are running into GC issues. 
> 
>>> WARN [ScheduledTasks:1] 2012-01-22 12:53:42,804 GCInspector.java (line 146) 
>>> Heap is 0.7767292149986439 full.  You may need to reduce memtable and/or 
>>> cache sizes.  Cassandra will now flush up to the two largest memtables to 
>>> free up memory.  Adjust flush_largest_memtables_at threshold in 
>>> cassandra.yaml if you don't want Cassandra to do this automatically
> 
> Can you reduce the size of the caches ? 
> 
> As you are under low load, does it correlate with compaction or repair 
> processes ? Check node tool compactioninfo
> 
> Do you have wide rows ? Checlk the max row size with nodetool cfstats. 
> 
> Also, if you have made any changes to the default memory and gc settings try 
> reverting them. 
> 
> 
> Hope that helps. 
> 
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 26/01/2012, at 5:24 AM, Vitalii Tymchyshyn wrote:
> 
>> According to the log, I don't see much time spent for GC. You can still 
>> check it with jstat or uncomment GC logging in cassandra-env.sh. Are you 
>> sure you've identified the thread correctly?
>> It's still possible that you have memory spike where GCInspector simply has 
>> no chance to run between Full GC rounds. Checking with jstat or adding GC 
>> logging may help to diagnose.
>> 
>> 25.01.12 17:24, Matthew Trinneer написав(ла):
>>> Here is a snippet of what I'm getting out of system.log for GC.  Anything 
>>> is there provide a clue?
>>> 
>>>  WARN [ScheduledTasks:1] 2012-01-22 12:53:42,804 GCInspector.java (line 
>>> 146) Heap is 0.7767292149986439 full.  You may need to reduce memtable 
>>> and/or cache sizes.  Cassandra will now flush up to the two largest 
>>> memtables to free up memory.  Adjust flush_largest_memtables_at threshold 
>>> in cassandra.yaml if you don't want Cassandra to do this automatically
>>>  INFO [ScheduledTasks:1] 2012-01-22 12:54:57,685 GCInspector.java (line 
>>> 123) GC for ConcurrentMarkSweep: 240 ms for 1 collections, 111478936 used; 
>>> max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-22 15:12:21,710 GCInspector.java (line 
>>> 123) GC for ConcurrentMarkSweep: 1141 ms for 1 collections, 167667688 used; 
>>> max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-23 14:20:32,862 GCInspector.java (line 
>>> 123) GC for ParNew: 205 ms for 1 collections, 2894546328 used; max is 
>>> 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-23 20:25:06,541 GCInspector.java (line 
>>> 123) GC for ParNew: 240 ms for 1 collections, 4602331064 used; max is 
>>> 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 13:24:57,473 GCInspector.java (line 
>>> 123) GC for ConcurrentMarkSweep: 27869 ms for 1 collections, 6376733632 
>>> used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 13:25:24,879 GCInspector.java (line 
>>> 123) GC for ConcurrentMarkSweep: 26306 ms for 1 collections, 6392079368 
>>> used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 13:27:12,991 GCInspector.java (line 
>>> 123) GC for ConcurrentMarkSweep: 238 ms for 1 collections, 131710776 used; 
>>> max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 13:55:48,326 GCInspector.java (line 
>>> 123) GC for ConcurrentMarkSweep: 609 ms for 1 collections, 50380160 used; 
>>> max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 14:34:41,392 GCInspector.java (line 
>>> 123) GC for ParNew: 325 ms for 1 collections, 1340375240 used; max is 
>>> 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-24 20:55:19,636 GCInspector.java (line 
>>> 123) GC for ParNew: 233 ms for 1 collections, 6387236992 used; max is 
>>> 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 14:43:28,921 GCInspector.java (line 
>>> 123) GC for ParNew: 337 ms for 1 collections, 7031219304 used; max is 
>>> 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 14:43:51,043 GCInspector.java (line 
>>> 123) GC for ParNew: 211 ms for 1 collections, 7025723712 used; max is 
>>> 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 14:50:00,012 GCInspector.java (line 
>>> 123) GC for ConcurrentMarkSweep: 51534 ms for 2 collections, 6844998736 
>>> used; max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 14:51:22,249 GCInspector.java (line 
>>> 123) GC for ConcurrentMarkSweep: 250 ms for 1 collections, 154848440 used; 
>>> max is 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 14:57:46,519 GCInspector.java (line 
>>> 123) GC for ParNew: 244 ms for 1 collections, 190838344 used; max is 
>>> 8547991552
>>>  INFO [ScheduledTasks:1] 2012-01-25 15:00:21,693 GCInspector.java (line 
>>> 123) GC for ConcurrentMarkSweep: 389 ms for 1 collections, 28748448 used; 
>>> max is 8547991552
>>> 
>>> 
>>> 
>>> On 2012-01-25, at 10:09 AM, Vitalii Tymchyshyn wrote:
>>> 
>>>> Hello.
>>>> 
>>>> What's in the logs? It should output something like "Hey, you've got most 
>>>> of your memory used. I am going to flush some of memtables". Sorry, I 
>>>> don't remember exact spelling, but it's gong from GC, so it should be 
>>>> greppable by "GC".
>>>> 
>>>> 25.01.12 16:26, Matthew Trinneer написав(ла):
>>>>> Hello Community,
>>>>> 
>>>>> Am troubleshooting an issue with sudden and sustained high CPU on nodes 
>>>>> in a 3 node cluster.  This takes place when there is minimal load on the 
>>>>> servers, and continues indefinitely until I stop and restart a node.  All 
>>>>> nodes (3) seem to be effected by the same issue, however it doesn't occur 
>>>>> simultaneous on them (although all potentially could be effected at the 
>>>>> same time).
>>>>> 
>>>>> Here's what I've been doing to troubleshoot.
>>>>> 
>>>>> * vmstat - have run on all servers and there is *no* swap going on
>>>>> 
>>>>> * iostat - have run on all servers and there is no indication that the 
>>>>> disk is in anyway a bottleneck
>>>>> 
>>>>> * nodetool - nothing is backed up.  Not using all available heap (about 
>>>>> 2/3rds).
>>>>> 
>>>>> * Have tracked it down to a specific thread (top -H).  When I find that 
>>>>> thread's hex equivalent in a jstack dump I see the following
>>>>> 
>>>>>   "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x0000000001d48800 
>>>>> nid=0x4534 runnable
>>>>> 
>>>>> 
>>>>> Which makes me think that perhaps it's something to do with GC 
>>>>> configuration.   Unfortunately I'm not able to determine why this might 
>>>>> be happening.  Any suggestions on how to continue troubleshooting?
>>>>> 
>>>>> btw - here's my jvm (just in case)
>>>>> 
>>>>> java version "1.6.0_22"
>>>>> Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
>>>>> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
>>>>> 
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> Matt
>>>>> 
>> 
> 

Reply via email to