Haven't seen this before, but perhaps it's related to CASSANDRA-10433? This
is just a wild guess as it's in a related codepath, but maybe worth trying
out the patch available to see if it helps anything...

2016-10-28 15:03 GMT-02:00 Dikang Gu <dikan...@gmail.com>:

> We are seeing huge cpu regression when upgrading one of our 2.0.16 cluster
> to 2.1.14 as well. The 2.1.14 node is not able to handle the same amount of
> read traffic as the 2.0.16 node, actually, it's less than 50%.
>
> And in the perf results, the first line could go as high as 50%, as we
> turn up the read traffic, which never appeared in 2.0.16.
>
> Any thoughts?
> Thanks
>
>
> Samples: 952K of event 'cycles', Event count (approx.): 229681774560
> Overhead  Shared Object                          Symbol
>    6.52%  perf-196410.map                        [.]
> Lorg/apache/cassandra/db/marshal/IntegerType;.compare in
> Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
>    4.84%  libzip.so                              [.] adler32
>    2.88%  perf-196410.map                        [.]
> Ljava/nio/HeapByteBuffer;.get in Lorg/apache/cassandra/db/marsh
> al/IntegerType;.compare
>    2.39%  perf-196410.map                        [.]
> Ljava/nio/Buffer;.checkIndex in Lorg/apache/cassandra/db/marsh
> al/IntegerType;.findMostSignificantByte
>    2.03%  perf-196410.map                        [.]
> Ljava/math/BigInteger;.compareTo in Lorg/apache/cassandra/db/Decor
> atedKey;.compareTo
>    1.65%  perf-196410.map                        [.] vtable chunks
>    1.44%  perf-196410.map                        [.]
> Lorg/apache/cassandra/db/DecoratedKey;.compareTo in
> Ljava/util/concurrent/ConcurrentSkipListMap;.findNode
>    1.02%  perf-196410.map                        [.]
> Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
>    1.00%  snappy-1.0.5.2-libsnappyjava.so        [.] 0x0000000000003804
>    0.87%  perf-196410.map                        [.]
> Ljava/io/DataInputStream;.readFully in Lorg/apache/cassandra/db/Abstr
> actCell$1;.computeNext
>    0.82%  snappy-1.0.5.2-libsnappyjava.so        [.] 0x00000000000036dc
>    0.79%  [kernel]                               [k]
> copy_user_generic_string
>    0.73%  perf-196410.map                        [.] vtable chunks
>    0.71%  perf-196410.map                        [.]
> Lorg/apache/cassandra/db/OnDiskAtom$Serializer;.deserializeFromSSTable in
> Lorg/apache/cassandra/db/AbstractCell$1;.computeNext
>    0.70%  [kernel]                               [k] find_busiest_group
>    0.69%  perf-196410.map                        [.] <80>H<F4>3<AE>^?
>    0.68%  perf-196410.map                        [.]
> Lorg/apache/cassandra/db/DecoratedKey;.compareTo
>    0.65%  perf-196410.map                        [.]
> jbyte_disjoint_arraycopy
>    0.64%  [kernel]                               [k] _raw_spin_lock
>    0.63%  [kernel]                               [k] __schedule
>    0.45%  snappy-1.0.5.2-libsnappyjava.so        [.] 0x00000000000036df
>
> On Fri, Jan 29, 2016 at 2:11 PM, Corry Opdenakker <co...@bestdata.be>
> wrote:
>
>> @JC, Get the pid of your target java process (something like "ps -ef |
>> grep -i cassandra") .
>> Then do a kill -3 <pid> (at unix/linux)
>> Check the stdout logfile of the process.
>>  it should contain the threaddump.
>> If you found it, then great!
>> Let that kill -3 loop for about 2 or 3 minutes.
>> Herafter copy paste and load the stdout file into one if the mentioned
>> tools.
>> If you are not familiar with the java internals, then those threaddumps
>> will learn you a lot:)
>>
>>
>>
>>
>> Op vrijdag 29 januari 2016 heeft Jean Carlo <jean.jeancar...@gmail.com>
>> het volgende geschreven:
>>
>>> I am having the same issue after upgrade cassandra 2.1.12 from 2.0.10. I
>>> am not good on jvm so I would like to know how to do what @
>>> CorryOpdenakker  propose with cassandra.
>>>
>>> :)
>>>
>>> I check concurrent_compactors
>>>
>>>
>>> Saludos
>>>
>>> Jean Carlo
>>>
>>> "The best way to predict the future is to invent it" Alan Kay
>>>
>>> On Fri, Jan 29, 2016 at 9:24 PM, Corry Opdenakker <co...@bestdata.be>
>>> wrote:
>>>
>>>> Hi guys,
>>>> Cassandra is still new for me, but I have a lot of java tuning
>>>> experience.
>>>>
>>>> For root cause detection of performance degradations its always good to
>>>> start with collecting a series of java thread dumps. Take at problem
>>>> occurrence using a loopscript for example 60 thread dumps with an interval
>>>> of 1 or 2 seconds.
>>>> Then load those dumps into IBM thread dump analyzer or in "eclipse mat"
>>>> or any similar tool and see which methods appear to be most active or
>>>> blocking others.
>>>>
>>>> Its really very useful
>>>>
>>>> Same can be be done in a normal situation to compare the difference.
>>>>
>>>> That should give more insights.
>>>>
>>>> Cheers, Corry
>>>>
>>>>
>>>> Op vrijdag 29 januari 2016 heeft Peddi, Praveen <pe...@amazon.com> het
>>>> volgende geschreven:
>>>>
>>>>> Hello,
>>>>> We have another update on performance on 2.1.11.
>>>>> compression_chunk_size  didn’t really help much but We changed
>>>>> concurrent_compactors from default to 64 in 2.1.11 and read latencies
>>>>> improved significantly. However, 2.1.11 read latencies are still 1.5 
>>>>> slower
>>>>> than 2.0.9. One thing we noticed in JMX metric that could affect read
>>>>> latencies is that 2.1.11 is running ReadRepairedBackground and
>>>>> ReadRepairedBlocking too frequently compared to 2.0.9 even though our 
>>>>> read_
>>>>> repair_chance is same on both. Could anyone shed some light on why
>>>>> 2.1.11 could be running read repair 10 to 50 times more in spite of same
>>>>> configuration on both clusters?
>>>>>
>>>>> dclocal_read_repair_chance=0.100000 AND
>>>>> read_repair_chance=0.000000 AND
>>>>>
>>>>> Here is the table for read repair metrics for both clusters.
>>>>> 2.0.9 2.1.11
>>>>> ReadRepairedBackground 5MinAvg 0.006 0.1
>>>>> 15MinAvg 0.009 0.153
>>>>> ReadRepairedBlocking 5MinAvg 0.002 0.55
>>>>> 15MinAvg 0.007 0.91
>>>>>
>>>>> Thanks
>>>>> Praveen
>>>>>
>>>>> From: Jeff Jirsa <jeff.ji...@crowdstrike.com>
>>>>> Reply-To: <user@cassandra.apache.org>
>>>>> Date: Thursday, January 14, 2016 at 2:58 PM
>>>>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>
>>>>> Sorry I wasn’t as explicit as I should have been
>>>>>
>>>>> The same buffer size is used by compressed reads as well, but tuned
>>>>> with compression_chunk_size table property. It’s likely true that if you
>>>>> lower compression_chunk_size, you’ll see improved read performance.
>>>>>
>>>>> This was covered in the AWS re:Invent youtube link I sent in my
>>>>> original reply.
>>>>>
>>>>>
>>>>>
>>>>> From: "Peddi, Praveen"
>>>>> Reply-To: "user@cassandra.apache.org"
>>>>> Date: Thursday, January 14, 2016 at 11:36 AM
>>>>> To: "user@cassandra.apache.org", Zhiyan Shao
>>>>> Cc: "Agrawal, Pratik"
>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>
>>>>> Hi,
>>>>> We will try with reduced “rar_buffer_size” to 4KB. However
>>>>> CASSANDRA-10249
>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-10249> says "this
>>>>> only affects users who have 1. disabled compression, 2. switched to
>>>>> buffered i/o from mmap’d”. None of this is true for us I believe. We use
>>>>> default disk_access_mode which should be mmap. We also used
>>>>> LZ4Compressor when created table.
>>>>>
>>>>> We will let you know if this property had any effect. We were testing
>>>>> with 2.1.11 and this was only fixed in 2.1.12 so we need to play with
>>>>> latest version.
>>>>>
>>>>> Praveen
>>>>>
>>>>>
>>>>>
>>>>> From: Jeff Jirsa <jeff.ji...@crowdstrike.com>
>>>>> Reply-To: <user@cassandra.apache.org>
>>>>> Date: Thursday, January 14, 2016 at 1:29 PM
>>>>> To: Zhiyan Shao <zhiyan.s...@gmail.com>, "user@cassandra.apache.org" <
>>>>> user@cassandra.apache.org>
>>>>> Cc: "Agrawal, Pratik" <paagr...@amazon.com>
>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>
>>>>> This may be due to https://issues.apache.org/j
>>>>> ira/browse/CASSANDRA-10249 / https://issues.apache.org/jira/
>>>>> browse/CASSANDRA-8894 - whether or not this is really the case
>>>>> depends on how much of your data is in page cache, and whether or not
>>>>> you’re using mmap. Since the original question was asked by someone using
>>>>> small RAM instances, it’s possible.
>>>>>
>>>>> We mitigate this by dropping compression_chunk_size in order to force
>>>>> a smaller buffer on reads, so we don’t over read very small blocks. This
>>>>> has other side effects (lower compression ratio, more garbage during
>>>>> streaming), but significantly speeds up read workloads for us.
>>>>>
>>>>>
>>>>> From: Zhiyan Shao
>>>>> Date: Thursday, January 14, 2016 at 9:49 AM
>>>>> To: "user@cassandra.apache.org"
>>>>> Cc: Jeff Jirsa, "Agrawal, Pratik"
>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>
>>>>> Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this
>>>>> forum, you can find another thread I sent a while ago. The perf test I did
>>>>> indicated that read is slower for 2.1.6 than 2.0.14 so we stayed with
>>>>> 2.0.14.
>>>>>
>>>>> On Tue, Jan 12, 2016 at 9:35 AM, Peddi, Praveen <pe...@amazon.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks Jeff for your reply. Sorry for delayed response. We were
>>>>>> running some more tests and wanted to wait for the results.
>>>>>>
>>>>>> So basically we saw higher CPU with 2.1.11 was higher compared to
>>>>>> 2.0.9 (see below) for the same exact load test. Memory spikes were also
>>>>>> aggressive on 2.1.11.
>>>>>>
>>>>>> So we wanted to rule out any of our custom setting so we ended up
>>>>>> doing some testing with Cassandra stress test and default Cassandra
>>>>>> installation. Here are the results we saw between 2.0.9 and 2.1.11. Both
>>>>>> are default installations and both use Cassandra stress test with same
>>>>>> params. This is the closest apple-apple comparison we can get. As you can
>>>>>> see both read and write latencies are 30 to 50% worse in 2.1.11 than 
>>>>>> 2.0.9.
>>>>>> Since we are using default installation.
>>>>>>
>>>>>> *Highlights of the test:*
>>>>>> Load: 2x reads and 1x writes
>>>>>> CPU:  2.0.9 (goes upto 25%)  compared to 2.1.11 (goes upto 60%)
>>>>>>
>>>>>> Local read latency: 0.039 ms for 2.0.9 and 0.066 ms for 2.1.11
>>>>>>
>>>>>> Local write Latency: 0.033 ms for 2.0.9 Vs 0.030 ms for 2.1.11
>>>>>>
>>>>>> *One observation is, As the number of threads are increased, 2.1.11
>>>>>> read latencies are getting worse compared to 2.0.9 (see below table for 
>>>>>> 24
>>>>>> threads vs 54 threads)*
>>>>>> Not sure if anyone has done this kind of comparison before and what
>>>>>> their thoughts are. I am thinking for this same reason
>>>>>>
>>>>>> 2.0.9 Plain  type       total ops     op/s     pk/s    row/s     mean
>>>>>> med 0.95 0.99 0.999      max    time
>>>>>>  16 threadCount  READ 66854 7205 7205 7205 1.6 1.3 2.8 3.5 9.6 85.3
>>>>>> 9.3
>>>>>>  16 threadCount  WRITE 33146 3572 3572 3572 1.3 1 2.6 3.3 7 206.5 9.3
>>>>>>  16 threadCount  total 100000 10777 10777 10777 1.5 1.3 2.7 3.4 7.9
>>>>>> 206.5 9.3
>>>>>> 2.1.11 Plain
>>>>>>  16 threadCount  READ 67096 6818 6818 6818 1.6 1.5 2.6 3.5 7.9 61.7
>>>>>> 9.8
>>>>>>  16 threadCount  WRITE 32904 3344 3344 3344 1.4 1.3 2.3 3 6.5 56.7
>>>>>> 9.8
>>>>>>  16 threadCount  total 100000 10162 10162 10162 1.6 1.4 2.5 3.2 6
>>>>>> 61.7 9.8
>>>>>> 2.0.9 Plain
>>>>>>  24 threadCount  READ 66414 8167 8167 8167 2 1.6 3.7 7.5 16.7 208 8.1
>>>>>>  24 threadCount  WRITE 33586 4130 4130 4130 1.7 1.3 3.4 5.4 25.6 45.4
>>>>>> 8.1
>>>>>>  24 threadCount  total 100000 12297 12297 12297 1.9 1.5 3.5 6.2 15.2
>>>>>> 208 8.1
>>>>>> 2.1.11 Plain
>>>>>>  24 threadCount  READ 66628 7433 7433 7433 2.2 2.1 3.4 4.3 8.4 38.3 9
>>>>>>  24 threadCount  WRITE 33372 3723 3723 3723 2 1.9 3.1 3.8 21.9 37.2 9
>>>>>>  24 threadCount  total 100000 11155 11155 11155 2.1 2 3.3 4.1 8.8
>>>>>> 38.3 9
>>>>>> 2.0.9 Plain
>>>>>>  54 threadCount  READ 67115 13419 13419 13419 2.8 2.6 4.2 6.4 36.9
>>>>>> 82.4 5
>>>>>>  54 threadCount  WRITE 32885 6575 6575 6575 2.5 2.3 3.9 5.6 15.9 81.5
>>>>>> 5
>>>>>>  54 threadCount  total 100000 19993 19993 19993 2.7 2.5 4.1 5.7 13.9
>>>>>> 82.4 5
>>>>>> 2.1.11 Plain
>>>>>>  54 threadCount  READ 66780 8951 8951 8951 4.3 3.9 6.8 9.7 49.4 69.9
>>>>>> 7.5
>>>>>>  54 threadCount  WRITE 33220 4453 4453 4453 3.5 3.2 5.7 8.2 36.8 68
>>>>>> 7.5
>>>>>>  54 threadCount  total 100000 13404 13404 13404 4 3.7 6.6 9.2 48 69.9
>>>>>> 7.5
>>>>>>
>>>>>> From: Jeff Jirsa <jeff.ji...@crowdstrike.com>
>>>>>> Date: Thursday, January 7, 2016 at 1:01 AM
>>>>>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>, Peddi
>>>>>> Praveen <pe...@amazon.com>
>>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>>
>>>>>> Anecdotal evidence typically agrees that 2.1 is faster than 2.0 (our
>>>>>> experience was anywhere from 20-60%, depending on workload).
>>>>>>
>>>>>> However, it’s not necessarily true that everything behaves exactly
>>>>>> the same – in particular, memtables are different, commitlog segment
>>>>>> handling is different, and GC params may need to be tuned differently for
>>>>>> 2.1 than 2.0.
>>>>>>
>>>>>> When the system is busy, what’s it actually DOING? Cassandra exposes
>>>>>> a TON of metrics – have you plugged any into a reporting system to see
>>>>>> what’s going on? Is your latency due to pegged cpu, iowait/disk queues or
>>>>>> gc pauses?
>>>>>>
>>>>>> My colleagues spent a lot of time validating different AWS EBS
>>>>>> configs (video from reinvent at https://www.youtube.com/wat
>>>>>> ch?v=1R-mgOcOSd4), 2.1 was faster in almost every case, but you’re
>>>>>> using an instance size I don’t believe we tried (too little RAM to be
>>>>>> viable in production).  c3.2xl only gives you 15G of ram – most
>>>>>> “performance” based systems want 2-4x that (people running G1 heaps 
>>>>>> usually
>>>>>> start at 16G heaps and leave another 16-30G for page cache), you’re 
>>>>>> running
>>>>>> fairly small hardware – it’s possible that 2.1 isn’t “as good” on smaller
>>>>>> hardware.
>>>>>>
>>>>>> (I do see your domain, presumably you know all of this, but just to
>>>>>> be sure):
>>>>>>
>>>>>> You’re using c3, so presumably you’re using EBS – are you using GP2?
>>>>>> Which volume sizes? Are they the same between versions? Are you hitting
>>>>>> your iops limits? Running out of burst tokens? Do you have enhanced
>>>>>> networking enabled? At load, what part of your system is stressed? Are 
>>>>>> you
>>>>>> cpu bound? Are you seeing GC pauses hurt latency? Have you tried changing
>>>>>> memtable_allocation_type -> offheap objects  (available in 2.1, not in
>>>>>> 2.0)?
>>>>>>
>>>>>> Tuning gc_grace is weird – do you understand what it does? Are you
>>>>>> overwriting or deleting a lot of data in your test (that’d be unusual)? 
>>>>>> Are
>>>>>> you doing a lot of compaction?
>>>>>>
>>>>>>
>>>>>> From: "Peddi, Praveen"
>>>>>> Reply-To: "user@cassandra.apache.org"
>>>>>> Date: Wednesday, January 6, 2016 at 11:41 AM
>>>>>> To: "user@cassandra.apache.org"
>>>>>> Subject: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>>
>>>>>> Hi,
>>>>>> We have upgraded Cassandra from 2.0.9 to 2.1.11 in our loadtest
>>>>>> environment with pretty much same yaml settings in both (removed unused
>>>>>> yaml settings and renamed few others) and we have noticed performance on
>>>>>> 2.1.11 is worse compared to 2.0.9. *After more investigation we
>>>>>> found that the performance gets worse as we increase replication factor 
>>>>>> on
>>>>>> 2.1.11 where as on 2.0.9 performance is more or less same.* Has
>>>>>> anything architecturally changed as far as replication is concerned in
>>>>>> 2.1.11?
>>>>>>
>>>>>> All googling only suggested 2.1.11 should be FASTER than 2.0.9 so we
>>>>>> are obviously doing something different. However the client code, load 
>>>>>> test
>>>>>> is all identical in both cases.
>>>>>>
>>>>>> Details:
>>>>>> Nodes: 3 ec2 c3.2x large
>>>>>> R/W Consistency: QUORUM
>>>>>> Renamed memtable_total_space_in_mb to memtable_heap_space_in_mb and
>>>>>> removed unused properties from yaml file.
>>>>>> We run compaction aggressive compaction with low gc_grace (15 mins)
>>>>>> but this is true for both 2.0.9 and 2.1.11.
>>>>>>
>>>>>> As you can see, all p50, p90 and p99 latencies stayed with in 10%
>>>>>> difference on 2.0.9 when we increased RF from 1 to 3, where as on 2.1.11
>>>>>> latencies almost doubled (especially reads are much slower than writes).
>>>>>>
>>>>>> # Nodes  RF # of rows 2.0.9 2.1.11
>>>>>> READ
>>>>>>       P50 P90 P99 P50 P90 P99
>>>>>> 3 1 450 306 594 747 425 849 1085
>>>>>> 3 3 450 358 634 877 708 1274 2642
>>>>>>
>>>>>> WRITE
>>>>>> 3 1 10 26 80 179 37 131 196
>>>>>> 3 3 10 31 96 184 46 166 468
>>>>>> Any pointers on how to debug performance issues will be appreciated.
>>>>>>
>>>>>> Praveen
>>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> ----------------------------------
>>>> Bestdata.be
>>>> Optimised ict
>>>> Tel:+32(0)496609576
>>>> co...@bestdata.be
>>>> ----------------------------------
>>>>
>>>>
>>>
>>
>> --
>> ----------------------------------
>> Bestdata.be
>> Optimised ict
>> Tel:+32(0)496609576
>> co...@bestdata.be
>> ----------------------------------
>>
>>
>
>
> --
> Dikang
>
>

Reply via email to