On 10/30/2013 11:34 AM, Jason Tang wrote:
What's configuration of following parameters
memtable_flush_queue_size:
concurrent_compactors:
memtable_flush_queue_size:  4
concurrent_compactors: 1

Lokking at the metrics I did not see FlushWriter pending issues
as well the compactions are keeping up with the pace.

tnx


2013/10/30 Piavlo <lolitus...@gmail.com <mailto:lolitus...@gmail.com>>

    Hi,

    Below I try to give a full picture to the problem I'm facing.

    This is a 12 node cluster, running on ec2 with m2.xlarge instances
    (17G ram , 2 cpus).
    Cassandra version is 1.0.8
    Cluster normally having between 3000 - 1500 reads per second
    (depends on time of the day) and 1700 - 800 writes per second-
    according to Opscetner.
    RF=3, now row caches are used.

    Memory relevant  configs from cassandra.yaml:
    flush_largest_memtables_at: 0.85
    reduce_cache_sizes_at: 0.90
    reduce_cache_capacity_to: 0.75
    commitlog_total_space_in_mb: 4096

    relevant JVM options used are:
    -Xms8000M -Xmx8000M -Xmn400M
    -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
    -XX:+CMSParallelRemarkEnabled -XX:MaxTenuringThreshold=1
    -XX:CMSInitiatingOccupancyFraction=80
    -XX:+UseCMSInitiatingOccupancyOnly"

    Now what happens is that with these settings after cassandra
    process restart, the GC it working fine at the beginning, and heap
    used looks like a
    saw with perfect teeth, eventually the teeth size start to
    diminish until the teeth become not noticable, and then cassandra
    starts to spend lot's of CPU time
    doing gc. It takes about 2 weeks until for such cycle , and then I
    need to restart cassandra process to improve performance.
    During all this time there are no memory  related messages in
    cassandra system.log, except a "GC for ParNew: little above 200ms"
    once in a while.

    Things i've already done trying to reduce this eventual heap pressure.
    1) reducing bloom_filter_fp_chance  resulting in reduction from
    ~700MB to ~280MB total per node based on all Filter.db files on
    the node.
    2) reducing key cache sizes, and dropping key_caches for CFs which
    do no not have many reads
    3) the heap size was increased from 7000M to 8000M
    All these have not really helped , just the increase from 7000M to
    8000M, helped in increase the cycle till excessive gc from ~9 days
    to ~14 days.

    I've tried to graph overtime the data that is supposed to be in
    heap vs actual heap size, by summing up all CFs bloom filter sizes
    + all CFs key cache capacities multipled by average key size + all
    CFs memtables data size reported (i've overestimated the data size
    a bit on purpose to be on the safe size).
    Here is a link to graph showing last 2 day metrics for a node
    which could not effectively do GC, and then cassandra process was
    restarted.
    http://awesomescreenshot.com/0401w5y534
    You can clearly see that before and after restart, the size of
    data that is in supposed to be in heap, is the same pretty much
    the same,
    which makes me think that I really need is GC tunning.

    Also I suppose that this is not due to number of total keys each
    node has , which is between 300 - 200 milions keys for all CF key
    estimates summed on a code.
    The nodes have datasize between 75G to 45G  accordingly to milions
    of keys. And all nodes are starting to have having GC heavy load
    after about 14 days.
    Also the excessive GC and heap usage are not affected by load
    which varies depending on time of the day (see read/write rates at
    the beginning of the mail).
    So again based on this , I assume this is not due to large number
    of keys or too much load on the cluster,  but due to a pure GC
    misconfiguration issue.

    Things I remember that I've tried for GC tunning:
    1) Changing -XX:MaxTenuringThreshold=1 to values like 8 - did not
    help.
    2) Adding  -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing
    -XX:CMSIncrementalDutyCycleMin=0
                      -XX:CMSIncrementalDutyCycle=10
    -XX:ParallelGCThreads=2 JVM_OPTS -XX:ParallelCMSThreads=1
        this actually made things worse.
    3) Adding -XX:-XX-UseAdaptiveSizePolicy -XX:SurvivorRatio=8 - did
    not help.

    Also since it takes like 2 weeks to verify that changing GC
    setting did not help, the process is painfully slow to try all the
    possibilities :)
    I'd highly appreciate any help and hints on the GC tunning.

    tnx
    Alex








Reply via email to