Re: full gc too often

Jonathan Haddad Sun, 07 Dec 2014 09:37:37 -0800

There's a lot of factors that go into tuning, and I don't know of any
reliable formula that you can use to figure out what's going to work
optimally for your hardware.  Personally I recommend:


1) find the bottleneck
2) playing with a parameter (or two)
3) see what changed, performance wise

If you've got a specific question I think someone can find a way to help,
but asking "what can 8gb of heap give me" is pretty abstract and
unanswerable.

Jon

On Sun Dec 07 2014 at 8:03:53 AM Philo Yang <ud1...@gmail.com> wrote:

> 2014-12-05 15:40 GMT+08:00 Jonathan Haddad <j...@jonhaddad.com>:
>
>> I recommend reading through https://issues.apache.
>> org/jira/browse/CASSANDRA-8150 to get an idea of how the JVM GC works
>> and what you can do to tune it.  Also good is Blake Eggleston's writeup
>> which can be found here: http://blakeeggleston.com/
>> cassandra-tuning-the-jvm-for-read-heavy-workloads.html
>>
>> I'd like to note that allocating 4GB heap to Cassandra under any serious
>> workload is unlikely to be sufficient.
>>
>>
> Thanks for your recommendation. After reading I try to allocate a larger
> heap and it is useful for me. 4G heap can't handle the workload in my use
> case indeed.
>
> So another question is, how much pressure dose default max heap (8G) can
> handle? The "pressure" may not be a simple qps, you know, slice query for
> many columns in a row will allocate more objects in heap than the query for
> a single column. Is there any testing result for the relationship between
> the "pressure" and the "safety" heap size? We know query a slice with many
> tombstones is not a good use case, but query a slice without tombstones may
> be a common use case, right?
>
>
>
>>
>> On Thu Dec 04 2014 at 8:43:38 PM Philo Yang <ud1...@gmail.com> wrote:
>>
>>> I have two kinds of machine:
>>> 16G RAM, with default heap size setting, about 4G.
>>> 64G RAM, with default heap size setting, about 8G.
>>>
>>> These two kinds of nodes have same number of vnodes, and both of them
>>> have gc issue, although the node of 16G have a higher probability  of gc
>>> issue.
>>>
>>> Thanks,
>>> Philo Yang
>>>
>>>
>>> 2014-12-05 12:34 GMT+08:00 Tim Heckman <t...@pagerduty.com>:
>>>
>>>> On Dec 4, 2014 8:14 PM, "Philo Yang" <ud1...@gmail.com> wrote:
>>>> >
>>>> > Hi,all
>>>> >
>>>> > I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with
>>>> full gc that sometime there may be one or two nodes full gc more than one
>>>> time per minute and over 10 seconds each time, then the node will be
>>>> unreachable and the latency of cluster will be increased.
>>>> >
>>>> > I grep the GCInspector's log, I found when the node is running fine
>>>> without gc trouble there are two kinds of gc:
>>>> > ParNew GC in less than 300ms which clear the Par Eden Space and
>>>> enlarge CMS Old Gen/ Par Survivor Space little (because it only show gc in
>>>> more than 200ms, there is only a small number of ParNew GC in log)
>>>> > ConcurrentMarkSweep in 4000~8000ms which reduce CMS Old Gen much and
>>>> enlarge Par Eden Space little, each 1-2 hours it will be executed once.
>>>> >
>>>> > However, sometimes ConcurrentMarkSweep will be strange like it shows:
>>>> >
>>>> > INFO  [Service Thread] 2014-12-05 11:28:44,629 GCInspector.java:142 -
>>>> ConcurrentMarkSweep GC in 12648ms.  CMS Old Gen: 3579838424 ->
>>>> 3579838464; Par Eden Space: 503316480 -> 294794576; Par Survivor
>>>> Space: 62914528 -> 0
>>>> > INFO  [Service Thread] 2014-12-05 11:28:59,581 GCInspector.java:142 -
>>>> ConcurrentMarkSweep GC in 12227ms.  CMS Old Gen: 3579838464 ->
>>>> 3579836512; Par Eden Space: 503316480 -> 310562032; Par Survivor
>>>> Space: 62872496 -> 0
>>>> > INFO  [Service Thread] 2014-12-05 11:29:14,686 GCInspector.java:142 -
>>>> ConcurrentMarkSweep GC in 11538ms.  CMS Old Gen: 3579836688 ->
>>>> 3579805792; Par Eden Space: 503316480 -> 332391096; Par Survivor
>>>> Space: 62914544 -> 0
>>>> > INFO  [Service Thread] 2014-12-05 11:29:29,371 GCInspector.java:142 -
>>>> ConcurrentMarkSweep GC in 12180ms.  CMS Old Gen: 3579835784 ->
>>>> 3579829760; Par Eden Space: 503316480 -> 351991456; Par Survivor
>>>> Space: 62914552 -> 0
>>>> > INFO  [Service Thread] 2014-12-05 11:29:45,028 GCInspector.java:142 -
>>>> ConcurrentMarkSweep GC in 10574ms.  CMS Old Gen: 3579838112 ->
>>>> 3579799752; Par Eden Space: 503316480 -> 366222584; Par Survivor
>>>> Space: 62914560 -> 0
>>>> > INFO  [Service Thread] 2014-12-05 11:29:59,546 GCInspector.java:142 -
>>>> ConcurrentMarkSweep GC in 11594ms.  CMS Old Gen: 3579831424 ->
>>>> 3579817392; Par Eden Space: 503316480 -> 388702928; Par Survivor
>>>> Space: 62914552 -> 0
>>>> > INFO  [Service Thread] 2014-12-05 11:30:14,153 GCInspector.java:142 -
>>>> ConcurrentMarkSweep GC in 11463ms.  CMS Old Gen: 3579817392 ->
>>>> 3579838424; Par Eden Space: 503316480 -> 408992784; Par Survivor
>>>> Space: 62896720 -> 0
>>>> > INFO  [Service Thread] 2014-12-05 11:30:25,009 GCInspector.java:142 -
>>>> ConcurrentMarkSweep GC in 9576ms.  CMS Old Gen: 3579838424 ->
>>>> 3579816424; Par Eden Space: 503316480 -> 438633608; Par Survivor
>>>> Space: 62914544 -> 0
>>>> > INFO  [Service Thread] 2014-12-05 11:30:39,929 GCInspector.java:142 -
>>>> ConcurrentMarkSweep GC in 11556ms.  CMS Old Gen: 3579816424 ->
>>>> 3579785496; Par Eden Space: 503316480 -> 441354856; Par Survivor
>>>> Space: 62889528 -> 0
>>>> > INFO  [Service Thread] 2014-12-05 11:30:54,085 GCInspector.java:142 -
>>>> ConcurrentMarkSweep GC in 12082ms.  CMS Old Gen: 3579786592 ->
>>>> 3579814464; Par Eden Space: 503316480 -> 448782440; Par Survivor
>>>> Space: 62914560 -> 0
>>>> >
>>>> > In each time Old Gen reduce only a little, Survivor Space will be
>>>> clear but the heap is still full so there will be another full gc very soon
>>>> then the node will down. If I restart the node, it will be fine without gc
>>>> trouble.
>>>> >
>>>> > Can anyone help me to find out where is the problem that full gc
>>>> can't reduce CMS Old Gen? Is it because there are too many objects in heap
>>>> can't be recycled? I think review the table scheme designing and add new
>>>> nodes into cluster is a good idea, but I still want to know if there is any
>>>> other reason causing this trouble.
>>>>
>>>> How much total system memory do you have? How much is allocated for
>>>> heap usage? How big is your working data set?
>>>>
>>>> The reason I ask is that I've seen problems with lots of GC with no
>>>> room gained, and it was memory pressure. Not enough for the heap. We
>>>> decided that just increasing the heap size was a bad idea, as we did rely
>>>> on free RAM being used for filesystem caching. So some vertical and
>>>> horizontal scaling allowed us to give Cass more heap space, as well as
>>>> distribute the workload to try and avoid further problems.
>>>>
>>>> > Thanks,
>>>> > Philo Yang
>>>>
>>>> Cheers!
>>>> -Tim
>>>>
>>>
>>>

Re: full gc too often

Reply via email to