My general "I can haz heap space?" approach. 

* determine total row count for the node from cfstats
* determine if wide (10's of MB) rows are in use
* determine total bloom filter space for the node from cfstats
* enable full GC logging as cassandra-env.sh
* determine tenured heap low point not long after startup and after running for 
a while. 

Consider locking the memtable_total_space_in_mb to 2048 rather than 1/3 heap 
while tuning. 

Consider changing JVM GC as below to check for premature tenuring (possibility 
with wide rows and wide reads):
        HEAP_NEWSIZE = "1200M"
        JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4" 
        JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=4"

^ Look at the tenuring distribution to see how many objects are making it 
through 4 ParNew passes. You will want to return the settings to something 
closer to the defaults, maybe 1000M, SurvivorRatio 4, MaxTenuringThreshold 2

If > 500 million rows and/or bloom filter size if > 750 MB consider:
        reduce bloom_filter_fp_chance (per cf) to 0.01 or 0.1 and nodetool 
upgradesstables
        increase index_interval in yaml to reduce number of samples
        watch keycache hit rate and consider increasing to 200MB

If you have a high tenured heap that is not decreasing after CMS the first 
place to look at the bloom filter and index samples. If this is an CF where the 
value is not specified then it's 0.000744 

Hope that helps. 
  
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 4/05/2013, at 7:20 AM, Oleg Dulin <oleg.du...@gmail.com> wrote:

> What constitutes an "extreme write" ?
> 
> On 2013-05-03 15:45:33
>  +0000, Edward Capriolo said:
> 
> If your writes are so extreme that metables are flushing all the time, the 
> best you can do is turn off all caches, do bloom filters off heap, and then 
> instruct cassandra to use large portions of the heap as memtables. 
> 
> 
> On Fri, May 3, 2013 at 11:40 AM, Bryan Talbot <btal...@aeriagames.com> wrote:
> It's true that a 16GB heap is generally not a good idea; however, it's not 
> clear from the data provided what problem you're trying to solve.
> 
> What is it that you don't like about the default settings?
> 
> -Bryan
> 
> 
> 
> On Fri, May 3, 2013 at 4:27 AM, Oleg Dulin <oleg.du...@gmail.com> wrote:
> Here is my question. It can't possibly be a good set up to use 16gig heap 
> space, but this is the best I can do. Setting it to default never worked well 
> for me, setting it to 8g doesn't work well either. It can't keep up with 
> flushing memtables. It is possibly that someone at some point may have broken 
> something in the config files. If I were to look for hints there, what should 
> I look at ?
> 
> Look at my gc log from Cassandra:
> 
> Starts off like this:
> 
> 2013-04-29T08:53:44.548-0400: 5.386: [GC 1677824K->11345K(16567552K), 
> 0.0509880 secs]
>    2 2013-04-29T08:53:47.701-0400: 8.539: [GC 1689169K->42027K(16567552K), 
> 0.1269180 secs]
>    3 2013-04-29T08:54:05.361-0400: 26.199: [GC 1719851K->231763K(16567552K), 
> 0.1436070 secs]
>    4 2013-04-29T08:55:44.797-0400: 125.635: [GC 
> 1909587K->1480096K(16567552K), 1.2626270 secs]
>    5 2013-04-29T08:58:44.367-0400: 305.205: [GC 
> 3157920K->2358588K(16567552K), 1.1198150 secs]
>    6 2013-04-29T09:01:12.167-0400: 453.005: [GC 
> 4036412K->3634298K(16567552K), 1.0098650 secs]
>    7 2013-04-29T09:03:35.204-0400: 596.042: [GC 
> 5312122K->4339703K(16567552K), 0.4597180 secs]
>    8 2013-04-29T09:04:51.562-0400: 672.400: [GC 
> 6017527K->4956381K(16567552K), 0.5361800 secs]
>    9 2013-04-29T09:04:59.205-0400: 680.043: [GC 
> 6634205K->5131825K(16567552K), 0.1741690 secs]
>   10 2013-04-29T09:05:06.638-0400: 687.476: [GC 
> 6809649K->5027933K(16567552K), 0.0607470 secs]
>   11 2013-04-29T09:05:13.908-0400: 694.747: [GC 
> 6705757K->5012439K(16567552K), 0.0624410 secs]
>   12 2013-04-29T09:05:20.909-0400: 701.747: [GC 
> 6690263K->5039538K(16567552K), 0.0618750 secs]
>   13 2013-04-29T09:06:35.914-0400: 776.752: [GC 
> 6717362K->5819204K(16567552K), 0.5738550 secs]
>   14 2013-04-29T09:08:05.589-0400: 866.428: [GC 
> 7497028K->6678597K(16567552K), 0.6781900 secs]
>   15 2013-04-29T09:08:12.458-0400: 873.296: [GC 
> 8356421K->6865736K(16567552K), 0.1423040 secs]
>   16 2013-04-29T09:08:18.690-0400: 879.529: [GC 
> 8543560K->6742902K(16567552K), 0.0516470 secs]
>   17 2013-04-29T09:08:24.914-0400: 885.752: [GC 
> 8420726K->6725877K(16567552K), 0.0517290 secs]
>   18 2013-04-29T09:08:31.008-0400: 891.846: [GC 
> 8403701K->6741781K(16567552K), 0.0532540 secs]
>   19 2013-04-29T09:08:37.201-0400: 898.039: [GC 
> 8419605K->6759614K(16567552K), 0.0563290 secs]
>   20 2013-04-29T09:08:43.493-0400: 904.331: [GC 
> 8437438K->6772147K(16567552K), 0.0569580 secs]
>   21 2013-04-29T09:08:49.757-0400: 910.595: [GC 
> 8449971K->6776883K(16567552K), 0.0558070 secs]
>   22 2013-04-29T09:08:55.973-0400: 916.812: [GC 
> 8454707K->6789404K(16567552K), 0.0577230 secs]
> 
> ……
> 
> 
> look what it is today:
> 
> 41536 2013-05-03T07:17:13.519-0400: 339814.357: [GC 
> 9178946K->9176740K(16567552K), 0.0265830 secs]
> 41537 2013-05-03T07:17:19.556-0400: 339820.394: [GC 
> 10854564K->9178449K(16567552K), 0.0253180 secs]
> 41538 2013-05-03T07:17:24.390-0400: 339825.228: [GC 
> 10856273K->9179073K(16567552K), 0.0266450 secs]
> 41539 2013-05-03T07:17:30.729-0400: 339831.567: [GC 
> 10856897K->9178629K(16567552K), 0.0261150 secs]
> 41540 2013-05-03T07:17:35.584-0400: 339836.422: [GC 
> 10856453K->9178586K(16567552K), 0.0250870 secs]
> 41541 2013-05-03T07:17:38.514-0400: 339839.352: [GC 
> 10856410K->9179314K(16567552K), 0.0258120 secs]
> 41542 2013-05-03T07:17:43.200-0400: 339844.038: [GC 
> 10857138K->9180160K(16567552K), 0.0250150 secs]
> 41543 2013-05-03T07:17:46.566-0400: 339847.404: [GC 
> 10857984K->9179071K(16567552K), 0.0264420 secs]
> 41544 2013-05-03T07:17:52.913-0400: 339853.751: [GC 
> 10856895K->9179870K(16567552K), 0.0262430 secs]
> 41545 2013-05-03T07:17:58.303-0400: 339859.141: [GC 
> 10857694K->9179209K(16567552K), 0.0255130 secs]
> 41546 2013-05-03T07:18:03.427-0400: 339864.265: [GC 
> 10857033K->9178316K(16567552K), 0.0263140 secs]
> 41547 2013-05-03T07:18:11.657-0400: 339872.495: [GC 
> 10856140K->9178351K(16567552K), 0.0265340 secs]
> 41548 2013-05-03T07:18:17.429-0400: 339878.267: [GC 
> 10856175K->9179067K(16567552K), 0.0254820 secs]
> 41549 2013-05-03T07:18:21.251-0400: 339882.089: [GC 
> 10856891K->9179680K(16567552K), 0.0264210 secs]
> 41550 2013-05-03T07:18:25.062-0400: 339885.900: [GC 
> 10857504K->9178985K(16567552K), 0.0267200 secs]
> 
> 
> 
> 
> -- 
> Regards,
> Oleg Dulin
> NYC Java Big Data Engineer
> http://www.olegdulin.com/
> 
> 
> -- 
> Regards,
> Oleg Dulin
> NYC Java Big Data Engineer
> http://www.olegdulin.com/

Reply via email to