Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Corry Opdenakker Fri, 29 Jan 2016 14:12:13 -0800

@JC, Get the pid of your target java process (something like "ps -ef | grep
-i cassandra") .
Then do a kill -3 <pid> (at unix/linux)
Check the stdout logfile of the process.
 it should contain the threaddump.
If you found it, then great!
Let that kill -3 loop for about 2 or 3 minutes.
Herafter copy paste and load the stdout file into one if the mentioned
tools.
If you are not familiar with the java internals, then those threaddumps
will learn you a lot:)




Op vrijdag 29 januari 2016 heeft Jean Carlo <jean.jeancar...@gmail.com> het
volgende geschreven:

> I am having the same issue after upgrade cassandra 2.1.12 from 2.0.10. I
> am not good on jvm so I would like to know how to do what @CorryOpdenakker
> propose with cassandra.
>
> :)
>
> I check concurrent_compactors
>
>
> Saludos
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
> On Fri, Jan 29, 2016 at 9:24 PM, Corry Opdenakker <co...@bestdata.be
> <javascript:_e(%7B%7D,'cvml','co...@bestdata.be');>> wrote:
>
>> Hi guys,
>> Cassandra is still new for me, but I have a lot of java tuning experience.
>>
>> For root cause detection of performance degradations its always good to
>> start with collecting a series of java thread dumps. Take at problem
>> occurrence using a loopscript for example 60 thread dumps with an interval
>> of 1 or 2 seconds.
>> Then load those dumps into IBM thread dump analyzer or in "eclipse mat"
>> or any similar tool and see which methods appear to be most active or
>> blocking others.
>>
>> Its really very useful
>>
>> Same can be be done in a normal situation to compare the difference.
>>
>> That should give more insights.
>>
>> Cheers, Corry
>>
>>
>> Op vrijdag 29 januari 2016 heeft Peddi, Praveen <pe...@amazon.com
>> <javascript:_e(%7B%7D,'cvml','pe...@amazon.com');>> het volgende
>> geschreven:
>>
>>> Hello,
>>> We have another update on performance on 2.1.11. compression_chunk_size
>>>  didn’t really help much but We changed concurrent_compactors from default
>>> to 64 in 2.1.11 and read latencies improved significantly. However, 2.1.11
>>> read latencies are still 1.5 slower than 2.0.9. One thing we noticed in JMX
>>> metric that could affect read latencies is that 2.1.11 is running
>>> ReadRepairedBackground and ReadRepairedBlocking too frequently compared to
>>> 2.0.9 even though our read_repair_chance is same on both. Could anyone
>>> shed some light on why 2.1.11 could be running read repair 10 to 50 times
>>> more in spite of same configuration on both clusters?
>>>
>>> dclocal_read_repair_chance=0.100000 AND
>>> read_repair_chance=0.000000 AND
>>>
>>> Here is the table for read repair metrics for both clusters.
>>> 2.0.9 2.1.11
>>> ReadRepairedBackground 5MinAvg 0.006 0.1
>>> 15MinAvg 0.009 0.153
>>> ReadRepairedBlocking 5MinAvg 0.002 0.55
>>> 15MinAvg 0.007 0.91
>>>
>>> Thanks
>>> Praveen
>>>
>>> From: Jeff Jirsa <jeff.ji...@crowdstrike.com>
>>> Reply-To: <user@cassandra.apache.org>
>>> Date: Thursday, January 14, 2016 at 2:58 PM
>>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>
>>> Sorry I wasn’t as explicit as I should have been
>>>
>>> The same buffer size is used by compressed reads as well, but tuned with
>>> compression_chunk_size table property. It’s likely true that if you lower
>>> compression_chunk_size, you’ll see improved read performance.
>>>
>>> This was covered in the AWS re:Invent youtube link I sent in my original
>>> reply.
>>>
>>>
>>>
>>> From: "Peddi, Praveen"
>>> Reply-To: "user@cassandra.apache.org"
>>> Date: Thursday, January 14, 2016 at 11:36 AM
>>> To: "user@cassandra.apache.org", Zhiyan Shao
>>> Cc: "Agrawal, Pratik"
>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>
>>> Hi,
>>> We will try with reduced “rar_buffer_size” to 4KB. However
>>> CASSANDRA-10249 <https://issues.apache.org/jira/browse/CASSANDRA-10249> says
>>> "this only affects users who have 1. disabled compression, 2. switched to
>>> buffered i/o from mmap’d”. None of this is true for us I believe. We use
>>> default disk_access_mode which should be mmap. We also used
>>> LZ4Compressor when created table.
>>>
>>> We will let you know if this property had any effect. We were testing
>>> with 2.1.11 and this was only fixed in 2.1.12 so we need to play with
>>> latest version.
>>>
>>> Praveen
>>>
>>>
>>>
>>> From: Jeff Jirsa <jeff.ji...@crowdstrike.com>
>>> Reply-To: <user@cassandra.apache.org>
>>> Date: Thursday, January 14, 2016 at 1:29 PM
>>> To: Zhiyan Shao <zhiyan.s...@gmail.com>, "user@cassandra.apache.org" <
>>> user@cassandra.apache.org>
>>> Cc: "Agrawal, Pratik" <paagr...@amazon.com>
>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>
>>> This may be due to https://issues.apache.org/jira/browse/CASSANDRA-10249
>>>  / https://issues.apache.org/jira/browse/CASSANDRA-8894 - whether or
>>> not this is really the case depends on how much of your data is in page
>>> cache, and whether or not you’re using mmap. Since the original question
>>> was asked by someone using small RAM instances, it’s possible.
>>>
>>> We mitigate this by dropping compression_chunk_size in order to force a
>>> smaller buffer on reads, so we don’t over read very small blocks. This has
>>> other side effects (lower compression ratio, more garbage during
>>> streaming), but significantly speeds up read workloads for us.
>>>
>>>
>>> From: Zhiyan Shao
>>> Date: Thursday, January 14, 2016 at 9:49 AM
>>> To: "user@cassandra.apache.org"
>>> Cc: Jeff Jirsa, "Agrawal, Pratik"
>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>
>>> Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this
>>> forum, you can find another thread I sent a while ago. The perf test I did
>>> indicated that read is slower for 2.1.6 than 2.0.14 so we stayed with
>>> 2.0.14.
>>>
>>> On Tue, Jan 12, 2016 at 9:35 AM, Peddi, Praveen <pe...@amazon.com>
>>> wrote:
>>>
>>>> Thanks Jeff for your reply. Sorry for delayed response. We were running
>>>> some more tests and wanted to wait for the results.
>>>>
>>>> So basically we saw higher CPU with 2.1.11 was higher compared to 2.0.9
>>>> (see below) for the same exact load test. Memory spikes were also
>>>> aggressive on 2.1.11.
>>>>
>>>> So we wanted to rule out any of our custom setting so we ended up doing
>>>> some testing with Cassandra stress test and default Cassandra installation.
>>>> Here are the results we saw between 2.0.9 and 2.1.11. Both are default
>>>> installations and both use Cassandra stress test with same params. This is
>>>> the closest apple-apple comparison we can get. As you can see both read and
>>>> write latencies are 30 to 50% worse in 2.1.11 than 2.0.9. Since we are
>>>> using default installation.
>>>>
>>>> *Highlights of the test:*
>>>> Load: 2x reads and 1x writes
>>>> CPU:  2.0.9 (goes upto 25%)  compared to 2.1.11 (goes upto 60%)
>>>>
>>>> Local read latency: 0.039 ms for 2.0.9 and 0.066 ms for 2.1.11
>>>>
>>>> Local write Latency: 0.033 ms for 2.0.9 Vs 0.030 ms for 2.1.11
>>>>
>>>> *One observation is, As the number of threads are increased, 2.1.11
>>>> read latencies are getting worse compared to 2.0.9 (see below table for 24
>>>> threads vs 54 threads)*
>>>> Not sure if anyone has done this kind of comparison before and what
>>>> their thoughts are. I am thinking for this same reason
>>>>
>>>> 2.0.9 Plain  type       total ops     op/s     pk/s    row/s     mean
>>>> med 0.95 0.99 0.999      max    time
>>>>  16 threadCount  READ 66854 7205 7205 7205 1.6 1.3 2.8 3.5 9.6 85.3 9.3
>>>>  16 threadCount  WRITE 33146 3572 3572 3572 1.3 1 2.6 3.3 7 206.5 9.3
>>>>  16 threadCount  total 100000 10777 10777 10777 1.5 1.3 2.7 3.4 7.9
>>>> 206.5 9.3
>>>> 2.1.11 Plain
>>>>  16 threadCount  READ 67096 6818 6818 6818 1.6 1.5 2.6 3.5 7.9 61.7 9.8
>>>>  16 threadCount  WRITE 32904 3344 3344 3344 1.4 1.3 2.3 3 6.5 56.7 9.8
>>>>  16 threadCount  total 100000 10162 10162 10162 1.6 1.4 2.5 3.2 6 61.7
>>>> 9.8
>>>> 2.0.9 Plain
>>>>  24 threadCount  READ 66414 8167 8167 8167 2 1.6 3.7 7.5 16.7 208 8.1
>>>>  24 threadCount  WRITE 33586 4130 4130 4130 1.7 1.3 3.4 5.4 25.6 45.4
>>>> 8.1
>>>>  24 threadCount  total 100000 12297 12297 12297 1.9 1.5 3.5 6.2 15.2
>>>> 208 8.1
>>>> 2.1.11 Plain
>>>>  24 threadCount  READ 66628 7433 7433 7433 2.2 2.1 3.4 4.3 8.4 38.3 9
>>>>  24 threadCount  WRITE 33372 3723 3723 3723 2 1.9 3.1 3.8 21.9 37.2 9
>>>>  24 threadCount  total 100000 11155 11155 11155 2.1 2 3.3 4.1 8.8 38.3
>>>> 9
>>>> 2.0.9 Plain
>>>>  54 threadCount  READ 67115 13419 13419 13419 2.8 2.6 4.2 6.4 36.9 82.4
>>>> 5
>>>>  54 threadCount  WRITE 32885 6575 6575 6575 2.5 2.3 3.9 5.6 15.9 81.5 5
>>>>  54 threadCount  total 100000 19993 19993 19993 2.7 2.5 4.1 5.7 13.9
>>>> 82.4 5
>>>> 2.1.11 Plain
>>>>  54 threadCount  READ 66780 8951 8951 8951 4.3 3.9 6.8 9.7 49.4 69.9
>>>> 7.5
>>>>  54 threadCount  WRITE 33220 4453 4453 4453 3.5 3.2 5.7 8.2 36.8 68 7.5
>>>>  54 threadCount  total 100000 13404 13404 13404 4 3.7 6.6 9.2 48 69.9
>>>> 7.5
>>>>
>>>> From: Jeff Jirsa <jeff.ji...@crowdstrike.com>
>>>> Date: Thursday, January 7, 2016 at 1:01 AM
>>>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>, Peddi
>>>> Praveen <pe...@amazon.com>
>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>
>>>> Anecdotal evidence typically agrees that 2.1 is faster than 2.0 (our
>>>> experience was anywhere from 20-60%, depending on workload).
>>>>
>>>> However, it’s not necessarily true that everything behaves exactly the
>>>> same – in particular, memtables are different, commitlog segment handling
>>>> is different, and GC params may need to be tuned differently for 2.1 than
>>>> 2.0.
>>>>
>>>> When the system is busy, what’s it actually DOING? Cassandra exposes a
>>>> TON of metrics – have you plugged any into a reporting system to see what’s
>>>> going on? Is your latency due to pegged cpu, iowait/disk queues or gc
>>>> pauses?
>>>>
>>>> My colleagues spent a lot of time validating different AWS EBS configs
>>>> (video from reinvent at https://www.youtube.com/watch?v=1R-mgOcOSd4),
>>>> 2.1 was faster in almost every case, but you’re using an instance size I
>>>> don’t believe we tried (too little RAM to be viable in production).  c3.2xl
>>>> only gives you 15G of ram – most “performance” based systems want 2-4x that
>>>> (people running G1 heaps usually start at 16G heaps and leave another
>>>> 16-30G for page cache), you’re running fairly small hardware – it’s
>>>> possible that 2.1 isn’t “as good” on smaller hardware.
>>>>
>>>> (I do see your domain, presumably you know all of this, but just to be
>>>> sure):
>>>>
>>>> You’re using c3, so presumably you’re using EBS – are you using GP2?
>>>> Which volume sizes? Are they the same between versions? Are you hitting
>>>> your iops limits? Running out of burst tokens? Do you have enhanced
>>>> networking enabled? At load, what part of your system is stressed? Are you
>>>> cpu bound? Are you seeing GC pauses hurt latency? Have you tried changing
>>>> memtable_allocation_type -> offheap objects  (available in 2.1, not in
>>>> 2.0)?
>>>>
>>>> Tuning gc_grace is weird – do you understand what it does? Are you
>>>> overwriting or deleting a lot of data in your test (that’d be unusual)? Are
>>>> you doing a lot of compaction?
>>>>
>>>>
>>>> From: "Peddi, Praveen"
>>>> Reply-To: "user@cassandra.apache.org"
>>>> Date: Wednesday, January 6, 2016 at 11:41 AM
>>>> To: "user@cassandra.apache.org"
>>>> Subject: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>
>>>> Hi,
>>>> We have upgraded Cassandra from 2.0.9 to 2.1.11 in our loadtest
>>>> environment with pretty much same yaml settings in both (removed unused
>>>> yaml settings and renamed few others) and we have noticed performance on
>>>> 2.1.11 is worse compared to 2.0.9. *After more investigation we found
>>>> that the performance gets worse as we increase replication factor on 2.1.11
>>>> where as on 2.0.9 performance is more or less same.* Has anything
>>>> architecturally changed as far as replication is concerned in 2.1.11?
>>>>
>>>> All googling only suggested 2.1.11 should be FASTER than 2.0.9 so we
>>>> are obviously doing something different. However the client code, load test
>>>> is all identical in both cases.
>>>>
>>>> Details:
>>>> Nodes: 3 ec2 c3.2x large
>>>> R/W Consistency: QUORUM
>>>> Renamed memtable_total_space_in_mb to memtable_heap_space_in_mb and
>>>> removed unused properties from yaml file.
>>>> We run compaction aggressive compaction with low gc_grace (15 mins) but
>>>> this is true for both 2.0.9 and 2.1.11.
>>>>
>>>> As you can see, all p50, p90 and p99 latencies stayed with in 10%
>>>> difference on 2.0.9 when we increased RF from 1 to 3, where as on 2.1.11
>>>> latencies almost doubled (especially reads are much slower than writes).
>>>>
>>>> # Nodes  RF # of rows 2.0.9 2.1.11
>>>> READ
>>>>       P50 P90 P99 P50 P90 P99
>>>> 3 1 450 306 594 747 425 849 1085
>>>> 3 3 450 358 634 877 708 1274 2642
>>>>
>>>> WRITE
>>>> 3 1 10 26 80 179 37 131 196
>>>> 3 3 10 31 96 184 46 166 468
>>>> Any pointers on how to debug performance issues will be appreciated.
>>>>
>>>> Praveen
>>>>
>>>
>>>
>>
>> --
>> ----------------------------------
>> Bestdata.be
>> Optimised ict
>> Tel:+32(0)496609576
>> co...@bestdata.be
>> ----------------------------------
>>
>>
>

-- 
----------------------------------
Bestdata.be
Optimised ict
Tel:+32(0)496609576
co...@bestdata.be
----------------------------------

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Reply via email to