I am having the same issue after upgrade cassandra 2.1.12 from 2.0.10. I am not good on jvm so I would like to know how to do what @CorryOpdenakker propose with cassandra.
:) I check concurrent_compactors Saludos Jean Carlo "The best way to predict the future is to invent it" Alan Kay On Fri, Jan 29, 2016 at 9:24 PM, Corry Opdenakker <co...@bestdata.be> wrote: > Hi guys, > Cassandra is still new for me, but I have a lot of java tuning experience. > > For root cause detection of performance degradations its always good to > start with collecting a series of java thread dumps. Take at problem > occurrence using a loopscript for example 60 thread dumps with an interval > of 1 or 2 seconds. > Then load those dumps into IBM thread dump analyzer or in "eclipse mat" or > any similar tool and see which methods appear to be most active or blocking > others. > > Its really very useful > > Same can be be done in a normal situation to compare the difference. > > That should give more insights. > > Cheers, Corry > > > Op vrijdag 29 januari 2016 heeft Peddi, Praveen <pe...@amazon.com> het > volgende geschreven: > >> Hello, >> We have another update on performance on 2.1.11. compression_chunk_size >> didn’t really help much but We changed concurrent_compactors from default >> to 64 in 2.1.11 and read latencies improved significantly. However, 2.1.11 >> read latencies are still 1.5 slower than 2.0.9. One thing we noticed in JMX >> metric that could affect read latencies is that 2.1.11 is running >> ReadRepairedBackground and ReadRepairedBlocking too frequently compared to >> 2.0.9 even though our read_repair_chance is same on both. Could anyone >> shed some light on why 2.1.11 could be running read repair 10 to 50 times >> more in spite of same configuration on both clusters? >> >> dclocal_read_repair_chance=0.100000 AND >> read_repair_chance=0.000000 AND >> >> Here is the table for read repair metrics for both clusters. >> 2.0.9 2.1.11 >> ReadRepairedBackground 5MinAvg 0.006 0.1 >> 15MinAvg 0.009 0.153 >> ReadRepairedBlocking 5MinAvg 0.002 0.55 >> 15MinAvg 0.007 0.91 >> >> Thanks >> Praveen >> >> From: Jeff Jirsa <jeff.ji...@crowdstrike.com> >> Reply-To: <user@cassandra.apache.org> >> Date: Thursday, January 14, 2016 at 2:58 PM >> To: "user@cassandra.apache.org" <user@cassandra.apache.org> >> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11 >> >> Sorry I wasn’t as explicit as I should have been >> >> The same buffer size is used by compressed reads as well, but tuned with >> compression_chunk_size table property. It’s likely true that if you lower >> compression_chunk_size, you’ll see improved read performance. >> >> This was covered in the AWS re:Invent youtube link I sent in my original >> reply. >> >> >> >> From: "Peddi, Praveen" >> Reply-To: "user@cassandra.apache.org" >> Date: Thursday, January 14, 2016 at 11:36 AM >> To: "user@cassandra.apache.org", Zhiyan Shao >> Cc: "Agrawal, Pratik" >> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11 >> >> Hi, >> We will try with reduced “rar_buffer_size” to 4KB. However >> CASSANDRA-10249 <https://issues.apache.org/jira/browse/CASSANDRA-10249> says >> "this only affects users who have 1. disabled compression, 2. switched to >> buffered i/o from mmap’d”. None of this is true for us I believe. We use >> default disk_access_mode which should be mmap. We also used >> LZ4Compressor when created table. >> >> We will let you know if this property had any effect. We were testing >> with 2.1.11 and this was only fixed in 2.1.12 so we need to play with >> latest version. >> >> Praveen >> >> >> >> From: Jeff Jirsa <jeff.ji...@crowdstrike.com> >> Reply-To: <user@cassandra.apache.org> >> Date: Thursday, January 14, 2016 at 1:29 PM >> To: Zhiyan Shao <zhiyan.s...@gmail.com>, "user@cassandra.apache.org" < >> user@cassandra.apache.org> >> Cc: "Agrawal, Pratik" <paagr...@amazon.com> >> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11 >> >> This may be due to https://issues.apache.org/jira/browse/CASSANDRA-10249 >> / https://issues.apache.org/jira/browse/CASSANDRA-8894 - whether or not >> this is really the case depends on how much of your data is in page cache, >> and whether or not you’re using mmap. Since the original question was asked >> by someone using small RAM instances, it’s possible. >> >> We mitigate this by dropping compression_chunk_size in order to force a >> smaller buffer on reads, so we don’t over read very small blocks. This has >> other side effects (lower compression ratio, more garbage during >> streaming), but significantly speeds up read workloads for us. >> >> >> From: Zhiyan Shao >> Date: Thursday, January 14, 2016 at 9:49 AM >> To: "user@cassandra.apache.org" >> Cc: Jeff Jirsa, "Agrawal, Pratik" >> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11 >> >> Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this >> forum, you can find another thread I sent a while ago. The perf test I did >> indicated that read is slower for 2.1.6 than 2.0.14 so we stayed with >> 2.0.14. >> >> On Tue, Jan 12, 2016 at 9:35 AM, Peddi, Praveen <pe...@amazon.com> wrote: >> >>> Thanks Jeff for your reply. Sorry for delayed response. We were running >>> some more tests and wanted to wait for the results. >>> >>> So basically we saw higher CPU with 2.1.11 was higher compared to 2.0.9 >>> (see below) for the same exact load test. Memory spikes were also >>> aggressive on 2.1.11. >>> >>> So we wanted to rule out any of our custom setting so we ended up doing >>> some testing with Cassandra stress test and default Cassandra installation. >>> Here are the results we saw between 2.0.9 and 2.1.11. Both are default >>> installations and both use Cassandra stress test with same params. This is >>> the closest apple-apple comparison we can get. As you can see both read and >>> write latencies are 30 to 50% worse in 2.1.11 than 2.0.9. Since we are >>> using default installation. >>> >>> *Highlights of the test:* >>> Load: 2x reads and 1x writes >>> CPU: 2.0.9 (goes upto 25%) compared to 2.1.11 (goes upto 60%) >>> >>> Local read latency: 0.039 ms for 2.0.9 and 0.066 ms for 2.1.11 >>> >>> Local write Latency: 0.033 ms for 2.0.9 Vs 0.030 ms for 2.1.11 >>> >>> *One observation is, As the number of threads are increased, 2.1.11 read >>> latencies are getting worse compared to 2.0.9 (see below table for 24 >>> threads vs 54 threads)* >>> Not sure if anyone has done this kind of comparison before and what >>> their thoughts are. I am thinking for this same reason >>> >>> 2.0.9 Plain type total ops op/s pk/s row/s mean >>> med 0.95 0.99 0.999 max time >>> 16 threadCount READ 66854 7205 7205 7205 1.6 1.3 2.8 3.5 9.6 85.3 9.3 >>> 16 threadCount WRITE 33146 3572 3572 3572 1.3 1 2.6 3.3 7 206.5 9.3 >>> 16 threadCount total 100000 10777 10777 10777 1.5 1.3 2.7 3.4 7.9 >>> 206.5 9.3 >>> 2.1.11 Plain >>> 16 threadCount READ 67096 6818 6818 6818 1.6 1.5 2.6 3.5 7.9 61.7 9.8 >>> 16 threadCount WRITE 32904 3344 3344 3344 1.4 1.3 2.3 3 6.5 56.7 9.8 >>> 16 threadCount total 100000 10162 10162 10162 1.6 1.4 2.5 3.2 6 61.7 >>> 9.8 >>> 2.0.9 Plain >>> 24 threadCount READ 66414 8167 8167 8167 2 1.6 3.7 7.5 16.7 208 8.1 >>> 24 threadCount WRITE 33586 4130 4130 4130 1.7 1.3 3.4 5.4 25.6 45.4 >>> 8.1 >>> 24 threadCount total 100000 12297 12297 12297 1.9 1.5 3.5 6.2 15.2 208 >>> 8.1 >>> 2.1.11 Plain >>> 24 threadCount READ 66628 7433 7433 7433 2.2 2.1 3.4 4.3 8.4 38.3 9 >>> 24 threadCount WRITE 33372 3723 3723 3723 2 1.9 3.1 3.8 21.9 37.2 9 >>> 24 threadCount total 100000 11155 11155 11155 2.1 2 3.3 4.1 8.8 38.3 9 >>> 2.0.9 Plain >>> 54 threadCount READ 67115 13419 13419 13419 2.8 2.6 4.2 6.4 36.9 82.4 >>> 5 >>> 54 threadCount WRITE 32885 6575 6575 6575 2.5 2.3 3.9 5.6 15.9 81.5 5 >>> 54 threadCount total 100000 19993 19993 19993 2.7 2.5 4.1 5.7 13.9 >>> 82.4 5 >>> 2.1.11 Plain >>> 54 threadCount READ 66780 8951 8951 8951 4.3 3.9 6.8 9.7 49.4 69.9 7.5 >>> 54 threadCount WRITE 33220 4453 4453 4453 3.5 3.2 5.7 8.2 36.8 68 7.5 >>> 54 threadCount total 100000 13404 13404 13404 4 3.7 6.6 9.2 48 69.9 >>> 7.5 >>> >>> From: Jeff Jirsa <jeff.ji...@crowdstrike.com> >>> Date: Thursday, January 7, 2016 at 1:01 AM >>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>, Peddi >>> Praveen <pe...@amazon.com> >>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11 >>> >>> Anecdotal evidence typically agrees that 2.1 is faster than 2.0 (our >>> experience was anywhere from 20-60%, depending on workload). >>> >>> However, it’s not necessarily true that everything behaves exactly the >>> same – in particular, memtables are different, commitlog segment handling >>> is different, and GC params may need to be tuned differently for 2.1 than >>> 2.0. >>> >>> When the system is busy, what’s it actually DOING? Cassandra exposes a >>> TON of metrics – have you plugged any into a reporting system to see what’s >>> going on? Is your latency due to pegged cpu, iowait/disk queues or gc >>> pauses? >>> >>> My colleagues spent a lot of time validating different AWS EBS configs >>> (video from reinvent at https://www.youtube.com/watch?v=1R-mgOcOSd4), >>> 2.1 was faster in almost every case, but you’re using an instance size I >>> don’t believe we tried (too little RAM to be viable in production). c3.2xl >>> only gives you 15G of ram – most “performance” based systems want 2-4x that >>> (people running G1 heaps usually start at 16G heaps and leave another >>> 16-30G for page cache), you’re running fairly small hardware – it’s >>> possible that 2.1 isn’t “as good” on smaller hardware. >>> >>> (I do see your domain, presumably you know all of this, but just to be >>> sure): >>> >>> You’re using c3, so presumably you’re using EBS – are you using GP2? >>> Which volume sizes? Are they the same between versions? Are you hitting >>> your iops limits? Running out of burst tokens? Do you have enhanced >>> networking enabled? At load, what part of your system is stressed? Are you >>> cpu bound? Are you seeing GC pauses hurt latency? Have you tried changing >>> memtable_allocation_type -> offheap objects (available in 2.1, not in >>> 2.0)? >>> >>> Tuning gc_grace is weird – do you understand what it does? Are you >>> overwriting or deleting a lot of data in your test (that’d be unusual)? Are >>> you doing a lot of compaction? >>> >>> >>> From: "Peddi, Praveen" >>> Reply-To: "user@cassandra.apache.org" >>> Date: Wednesday, January 6, 2016 at 11:41 AM >>> To: "user@cassandra.apache.org" >>> Subject: Slow performance after upgrading from 2.0.9 to 2.1.11 >>> >>> Hi, >>> We have upgraded Cassandra from 2.0.9 to 2.1.11 in our loadtest >>> environment with pretty much same yaml settings in both (removed unused >>> yaml settings and renamed few others) and we have noticed performance on >>> 2.1.11 is worse compared to 2.0.9. *After more investigation we found >>> that the performance gets worse as we increase replication factor on 2.1.11 >>> where as on 2.0.9 performance is more or less same.* Has anything >>> architecturally changed as far as replication is concerned in 2.1.11? >>> >>> All googling only suggested 2.1.11 should be FASTER than 2.0.9 so we are >>> obviously doing something different. However the client code, load test is >>> all identical in both cases. >>> >>> Details: >>> Nodes: 3 ec2 c3.2x large >>> R/W Consistency: QUORUM >>> Renamed memtable_total_space_in_mb to memtable_heap_space_in_mb and >>> removed unused properties from yaml file. >>> We run compaction aggressive compaction with low gc_grace (15 mins) but >>> this is true for both 2.0.9 and 2.1.11. >>> >>> As you can see, all p50, p90 and p99 latencies stayed with in 10% >>> difference on 2.0.9 when we increased RF from 1 to 3, where as on 2.1.11 >>> latencies almost doubled (especially reads are much slower than writes). >>> >>> # Nodes RF # of rows 2.0.9 2.1.11 >>> READ >>> P50 P90 P99 P50 P90 P99 >>> 3 1 450 306 594 747 425 849 1085 >>> 3 3 450 358 634 877 708 1274 2642 >>> >>> WRITE >>> 3 1 10 26 80 179 37 131 196 >>> 3 3 10 31 96 184 46 166 468 >>> Any pointers on how to debug performance issues will be appreciated. >>> >>> Praveen >>> >> >> > > -- > ---------------------------------- > Bestdata.be > Optimised ict > Tel:+32(0)496609576 > co...@bestdata.be > ---------------------------------- > >