There's a lot of factors that go into tuning, and I don't know of any reliable formula that you can use to figure out what's going to work optimally for your hardware. Personally I recommend:
1) find the bottleneck 2) playing with a parameter (or two) 3) see what changed, performance wise If you've got a specific question I think someone can find a way to help, but asking "what can 8gb of heap give me" is pretty abstract and unanswerable. Jon On Sun Dec 07 2014 at 8:03:53 AM Philo Yang <ud1...@gmail.com> wrote: > 2014-12-05 15:40 GMT+08:00 Jonathan Haddad <j...@jonhaddad.com>: > >> I recommend reading through https://issues.apache. >> org/jira/browse/CASSANDRA-8150 to get an idea of how the JVM GC works >> and what you can do to tune it. Also good is Blake Eggleston's writeup >> which can be found here: http://blakeeggleston.com/ >> cassandra-tuning-the-jvm-for-read-heavy-workloads.html >> >> I'd like to note that allocating 4GB heap to Cassandra under any serious >> workload is unlikely to be sufficient. >> >> > Thanks for your recommendation. After reading I try to allocate a larger > heap and it is useful for me. 4G heap can't handle the workload in my use > case indeed. > > So another question is, how much pressure dose default max heap (8G) can > handle? The "pressure" may not be a simple qps, you know, slice query for > many columns in a row will allocate more objects in heap than the query for > a single column. Is there any testing result for the relationship between > the "pressure" and the "safety" heap size? We know query a slice with many > tombstones is not a good use case, but query a slice without tombstones may > be a common use case, right? > > > >> >> On Thu Dec 04 2014 at 8:43:38 PM Philo Yang <ud1...@gmail.com> wrote: >> >>> I have two kinds of machine: >>> 16G RAM, with default heap size setting, about 4G. >>> 64G RAM, with default heap size setting, about 8G. >>> >>> These two kinds of nodes have same number of vnodes, and both of them >>> have gc issue, although the node of 16G have a higher probability of gc >>> issue. >>> >>> Thanks, >>> Philo Yang >>> >>> >>> 2014-12-05 12:34 GMT+08:00 Tim Heckman <t...@pagerduty.com>: >>> >>>> On Dec 4, 2014 8:14 PM, "Philo Yang" <ud1...@gmail.com> wrote: >>>> > >>>> > Hi,all >>>> > >>>> > I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with >>>> full gc that sometime there may be one or two nodes full gc more than one >>>> time per minute and over 10 seconds each time, then the node will be >>>> unreachable and the latency of cluster will be increased. >>>> > >>>> > I grep the GCInspector's log, I found when the node is running fine >>>> without gc trouble there are two kinds of gc: >>>> > ParNew GC in less than 300ms which clear the Par Eden Space and >>>> enlarge CMS Old Gen/ Par Survivor Space little (because it only show gc in >>>> more than 200ms, there is only a small number of ParNew GC in log) >>>> > ConcurrentMarkSweep in 4000~8000ms which reduce CMS Old Gen much and >>>> enlarge Par Eden Space little, each 1-2 hours it will be executed once. >>>> > >>>> > However, sometimes ConcurrentMarkSweep will be strange like it shows: >>>> > >>>> > INFO [Service Thread] 2014-12-05 11:28:44,629 GCInspector.java:142 - >>>> ConcurrentMarkSweep GC in 12648ms. CMS Old Gen: 3579838424 -> >>>> 3579838464; Par Eden Space: 503316480 -> 294794576; Par Survivor >>>> Space: 62914528 -> 0 >>>> > INFO [Service Thread] 2014-12-05 11:28:59,581 GCInspector.java:142 - >>>> ConcurrentMarkSweep GC in 12227ms. CMS Old Gen: 3579838464 -> >>>> 3579836512; Par Eden Space: 503316480 -> 310562032; Par Survivor >>>> Space: 62872496 -> 0 >>>> > INFO [Service Thread] 2014-12-05 11:29:14,686 GCInspector.java:142 - >>>> ConcurrentMarkSweep GC in 11538ms. CMS Old Gen: 3579836688 -> >>>> 3579805792; Par Eden Space: 503316480 -> 332391096; Par Survivor >>>> Space: 62914544 -> 0 >>>> > INFO [Service Thread] 2014-12-05 11:29:29,371 GCInspector.java:142 - >>>> ConcurrentMarkSweep GC in 12180ms. CMS Old Gen: 3579835784 -> >>>> 3579829760; Par Eden Space: 503316480 -> 351991456; Par Survivor >>>> Space: 62914552 -> 0 >>>> > INFO [Service Thread] 2014-12-05 11:29:45,028 GCInspector.java:142 - >>>> ConcurrentMarkSweep GC in 10574ms. CMS Old Gen: 3579838112 -> >>>> 3579799752; Par Eden Space: 503316480 -> 366222584; Par Survivor >>>> Space: 62914560 -> 0 >>>> > INFO [Service Thread] 2014-12-05 11:29:59,546 GCInspector.java:142 - >>>> ConcurrentMarkSweep GC in 11594ms. CMS Old Gen: 3579831424 -> >>>> 3579817392; Par Eden Space: 503316480 -> 388702928; Par Survivor >>>> Space: 62914552 -> 0 >>>> > INFO [Service Thread] 2014-12-05 11:30:14,153 GCInspector.java:142 - >>>> ConcurrentMarkSweep GC in 11463ms. CMS Old Gen: 3579817392 -> >>>> 3579838424; Par Eden Space: 503316480 -> 408992784; Par Survivor >>>> Space: 62896720 -> 0 >>>> > INFO [Service Thread] 2014-12-05 11:30:25,009 GCInspector.java:142 - >>>> ConcurrentMarkSweep GC in 9576ms. CMS Old Gen: 3579838424 -> >>>> 3579816424; Par Eden Space: 503316480 -> 438633608; Par Survivor >>>> Space: 62914544 -> 0 >>>> > INFO [Service Thread] 2014-12-05 11:30:39,929 GCInspector.java:142 - >>>> ConcurrentMarkSweep GC in 11556ms. CMS Old Gen: 3579816424 -> >>>> 3579785496; Par Eden Space: 503316480 -> 441354856; Par Survivor >>>> Space: 62889528 -> 0 >>>> > INFO [Service Thread] 2014-12-05 11:30:54,085 GCInspector.java:142 - >>>> ConcurrentMarkSweep GC in 12082ms. CMS Old Gen: 3579786592 -> >>>> 3579814464; Par Eden Space: 503316480 -> 448782440; Par Survivor >>>> Space: 62914560 -> 0 >>>> > >>>> > In each time Old Gen reduce only a little, Survivor Space will be >>>> clear but the heap is still full so there will be another full gc very soon >>>> then the node will down. If I restart the node, it will be fine without gc >>>> trouble. >>>> > >>>> > Can anyone help me to find out where is the problem that full gc >>>> can't reduce CMS Old Gen? Is it because there are too many objects in heap >>>> can't be recycled? I think review the table scheme designing and add new >>>> nodes into cluster is a good idea, but I still want to know if there is any >>>> other reason causing this trouble. >>>> >>>> How much total system memory do you have? How much is allocated for >>>> heap usage? How big is your working data set? >>>> >>>> The reason I ask is that I've seen problems with lots of GC with no >>>> room gained, and it was memory pressure. Not enough for the heap. We >>>> decided that just increasing the heap size was a bad idea, as we did rely >>>> on free RAM being used for filesystem caching. So some vertical and >>>> horizontal scaling allowed us to give Cass more heap space, as well as >>>> distribute the workload to try and avoid further problems. >>>> >>>> > Thanks, >>>> > Philo Yang >>>> >>>> Cheers! >>>> -Tim >>>> >>> >>>