Re: High read latency cluster

Bill de hÓra Fri, 08 Feb 2013 09:53:28 -0800

> FlushWriter                       0         0           8252         0        
>        299


If you are not suffering from gc pressure/pauses (possibly not, because you 
don't seem to have a lot of read failures in tpstats or outlier latency on the 
histograms), then the flush writer errors are suggestive of memtable pressure, 
which may be followed by compactions that grind the disk. 

> maybe because of bloomfilters

 If you think bloom filters or indexes are occupying heap on startup, then you 
could alleviate things for a while with memtable/cache tuning, resampling the 
index interval, or increasing the heap to 10G (yes, not generally recommended). 
 Not enough working ram can also impact the key cache which then puts more 
pressure on disk - check info to see if your caches are being resized. If then 
disk I/O is simply be falling behind in conjunction with a server that doesn't 
have much memory headroom, then you'll want to expand the cluster at some point 
to spread out the load. 

Bill


On 8 Feb 2013, at 13:03, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> Hi,
> 
> I have some big latencies (OpsCenter homepage shows an average about 30-60 
> ms), inducing instability in my front servers, stacking queries, waiting for 
> C* to answer, in the following 1.1.6 C* cluster:
> 
> 10.208.45.173   eu-west     1b          Up     Normal  297.02 GB       
> 100.00%             0
> 10.208.40.6      eu-west     1b          Up     Normal  292.91 GB       
> 100.00%             56713727820156407428984779325531226112
> 10.208.47.135   eu-west     1b          Up     Normal  307.96 GB       
> 100.00%             113427455640312814857969558651062452224
> 
> I run on 3 AWS m1.xLarge with mostly the Datastax AMI default node 
> configuration (But with the following options MAX_HEAP_SIZE="8G" 
> HEAP_NEWSIZE="400M", I was under regular memory pressure with the default 4GB 
> heap, maybe because of bloomfilters). RF = 3, CL = QUORUM r/w.
> 
> I have a high load from 4 to 15 with an average of 8 (mainly because of 
> iowait which can reach up to 40-60%).
> 
> extract from "iostat -mx 5 10" :
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>                 16.66    0.00    4.82         35.47       0.21   42.85
> 
> 
> I use compression and Size Tiered Compaction Strategy for any of my CF.
> 
> A typical CF :
> 
> create column family active_product
>   with column_type = 'Standard'
>   and comparator = 'UTF8Type'
>   and default_validation_class = 'UTF8Type'
>   and key_validation_class = 'UTF8Type'
>   and read_repair_chance = 0.1
>   and dclocal_read_repair_chance = 0.0
>   and gc_grace = 864000
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 12
>   and replicate_on_write = true
>   and compaction_strategy = 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>   and caching = 'KEYS_ONLY'
>   and bloom_filter_fp_chance = 0.01
>   and compression_options = {'sstable_compression' : 
> 'org.apache.cassandra.io.compress.SnappyCompressor'};
> 
> And there is a typical counter CF:
> 
> create column family algo_product_view
>   with column_type = 'Standard'
>   and comparator = 'UTF8Type'
>   and default_validation_class = 'CounterColumnType'
>   and key_validation_class = 'UTF8Type'
>   and read_repair_chance = 0.1
>   and dclocal_read_repair_chance = 0.0
>   and gc_grace = 864000
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 12
>   and replicate_on_write = true
>   and compaction_strategy = 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>   and caching = 'KEYS_ONLY'
>   and bloom_filter_fp_chance = 0.01
>   and compression_options = {'sstable_compression' : 
> 'org.apache.cassandra.io.compress.SnappyCompressor'};
> 
> I attach my cfhistograms, proxyhistograms, cfstats and tpstats hopping a clue 
> is somewhere in there, even if I was unable to learn something there by 
> myself.
> 
> cfstats: http://pastebin.com/z3sAshjP
> tpstats: http://pastebin.com/LETPqfLV
> proxyhistograms: http://pastebin.com/FqwMFrxG
> cfhistograms (from the 2 most read / highest latencies):  
> http://pastebin.com/BCsdc50z & http://pastebin.com/CGZZpydL
> 
> These latencies are quite annoying, hope you'll help me figuring out what I 
> am doing wrong or how I can tune Cassandra better.
> 
> Alain

Re: High read latency cluster

Reply via email to