Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Nate McCall Fri, 29 Jan 2016 12:02:20 -0800

On Fri, Jan 29, 2016 at 12:30 PM, Peddi, Praveen <pe...@amazon.com> wrote:
>
> Hello,
> We have another update on performance on 2.1.11. compression_chunk_size
 didn’t really help much but We changed concurrent_compactors from default
to 64 in 2.1.11 and read latencies improved significantly. However, 2.1.11
read latencies are still 1.5 slower than 2.0.9. One thing we noticed in JMX
metric that could affect read latencies is that 2.1.11 is running
ReadRepairedBackground and ReadRepairedBlocking too frequently compared to
2.0.9 even though our read_repair_chance is same on both. Could anyone shed
some light on why 2.1.11 could be running read repair 10 to 50 times more
in spite of same configuration on both clusters?
>
> dclocal_read_repair_chance=0.100000 AND
> read_repair_chance=0.000000 AND
>
> Here is the table for read repair metrics for both clusters.
> 2.0.9 2.1.11
> ReadRepairedBackground 5MinAvg 0.006 0.1
> 15MinAvg 0.009 0.153
> ReadRepairedBlocking 5MinAvg 0.002 0.55
> 15MinAvg 0.007 0.91


The concurrent_compactors setting is not a surprise. The default in 2.0 was
the number of cores and in 2.1 is now:
"the smaller of (number of disks, number of cores), with a minimum of 2 and
a maximum of 8"
https://github.com/apache/cassandra/blob/cassandra-2.1/conf/cassandra.yaml#L567-L568

So in your case this was "8" in 2.0 vs. "2" in 2.1 (assuming these are
still the stock-ish c3.2xl mentioned previously?). Regardless, 64 is way to
high. Set it back to 8.

Note: this got dropped off the "Upgrading" guide for 2.1 in
https://github.com/apache/cassandra/blob/cassandra-2.1/NEWS.txt though, so
lots of folks miss it.

Per said upgrading guide - are you sure the data directory is in the same
place between the two versions and you are not pegging the wrong
disk/partition? The default locations changed for data, cache and commitlog:
https://github.com/apache/cassandra/blob/cassandra-2.1/NEWS.txt#L171-L180

I ask because being really busy on a single disk would cause latency and
potentially dropped messages which could eventually cause a
DigestMismatchException requiring a blocking read repair.

Anything unusual in the node-level IO activity between the two clusters?

That said, the difference in nodetool tpstats output during and after on
both could be insightful.

When we do perf tests internally we usually use a combination of Grafana
and Riemann to monitor Cassandra internals, the JVM and the OS. Otherwise,
it's guess work.

--
-----------------
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Reply via email to