Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this forum, you can find another thread I sent a while ago. The perf test I did indicated that read is slower for 2.1.6 than 2.0.14 so we stayed with 2.0.14.
On Tue, Jan 12, 2016 at 9:35 AM, Peddi, Praveen <pe...@amazon.com> wrote: > Thanks Jeff for your reply. Sorry for delayed response. We were running > some more tests and wanted to wait for the results. > > So basically we saw higher CPU with 2.1.11 was higher compared to 2.0.9 > (see below) for the same exact load test. Memory spikes were also > aggressive on 2.1.11. > > So we wanted to rule out any of our custom setting so we ended up doing > some testing with Cassandra stress test and default Cassandra installation. > Here are the results we saw between 2.0.9 and 2.1.11. Both are default > installations and both use Cassandra stress test with same params. This is > the closest apple-apple comparison we can get. As you can see both read and > write latencies are 30 to 50% worse in 2.1.11 than 2.0.9. Since we are > using default installation. > > *Highlights of the test:* > Load: 2x reads and 1x writes > CPU: 2.0.9 (goes upto 25%) compared to 2.1.11 (goes upto 60%) > > Local read latency: 0.039 ms for 2.0.9 and 0.066 ms for 2.1.11 > > Local write Latency: 0.033 ms for 2.0.9 Vs 0.030 ms for 2.1.11 > > *One observation is, As the number of threads are increased, 2.1.11 read > latencies are getting worse compared to 2.0.9 (see below table for 24 > threads vs 54 threads)* > Not sure if anyone has done this kind of comparison before and what their > thoughts are. I am thinking for this same reason > > 2.0.9 Plain type total ops op/s pk/s row/s mean > med 0.95 0.99 0.999 max time 16 threadCount READ 66854 7205 7205 > 7205 1.6 1.3 2.8 3.5 9.6 85.3 9.3 16 threadCount WRITE 33146 3572 3572 > 3572 1.3 1 2.6 3.3 7 206.5 9.3 16 threadCount total 100000 10777 10777 > 10777 1.5 1.3 2.7 3.4 7.9 206.5 9.3 2.1.11 Plain 16 > threadCount READ 67096 6818 6818 6818 1.6 1.5 2.6 3.5 7.9 61.7 9.8 16 > threadCount WRITE 32904 3344 3344 3344 1.4 1.3 2.3 3 6.5 56.7 9.8 16 > threadCount total 100000 10162 10162 10162 1.6 1.4 2.5 3.2 6 61.7 9.8 2.0.9 > Plain 24 threadCount READ 66414 8167 8167 8167 2 > 1.6 3.7 7.5 16.7 208 8.1 24 threadCount WRITE 33586 4130 4130 4130 1.7 > 1.3 3.4 5.4 25.6 45.4 8.1 24 threadCount total 100000 12297 12297 12297 > 1.9 1.5 3.5 6.2 15.2 208 8.1 2.1.11 Plain 24 > threadCount READ 66628 7433 7433 7433 2.2 2.1 3.4 4.3 8.4 38.3 9 24 > threadCount WRITE 33372 3723 3723 3723 2 1.9 3.1 3.8 21.9 37.2 9 24 > threadCount total 100000 11155 11155 11155 2.1 2 3.3 4.1 8.8 38.3 9 2.0.9 > Plain 54 threadCount READ 67115 13419 13419 > 13419 2.8 2.6 4.2 6.4 36.9 82.4 5 54 threadCount WRITE 32885 6575 6575 > 6575 2.5 2.3 3.9 5.6 15.9 81.5 5 54 threadCount total 100000 19993 19993 > 19993 2.7 2.5 4.1 5.7 13.9 82.4 5 2.1.11 Plain 54 > threadCount READ 66780 8951 8951 8951 4.3 3.9 6.8 9.7 49.4 69.9 7.5 54 > threadCount WRITE 33220 4453 4453 4453 3.5 3.2 5.7 8.2 36.8 68 7.5 54 > threadCount total 100000 13404 13404 13404 4 3.7 6.6 9.2 48 69.9 7.5 > > From: Jeff Jirsa <jeff.ji...@crowdstrike.com> > Date: Thursday, January 7, 2016 at 1:01 AM > To: "user@cassandra.apache.org" <user@cassandra.apache.org>, Peddi > Praveen <pe...@amazon.com> > Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11 > > Anecdotal evidence typically agrees that 2.1 is faster than 2.0 (our > experience was anywhere from 20-60%, depending on workload). > > However, it’s not necessarily true that everything behaves exactly the > same – in particular, memtables are different, commitlog segment handling > is different, and GC params may need to be tuned differently for 2.1 than > 2.0. > > When the system is busy, what’s it actually DOING? Cassandra exposes a TON > of metrics – have you plugged any into a reporting system to see what’s > going on? Is your latency due to pegged cpu, iowait/disk queues or gc > pauses? > > My colleagues spent a lot of time validating different AWS EBS configs > (video from reinvent at https://www.youtube.com/watch?v=1R-mgOcOSd4), 2.1 > was faster in almost every case, but you’re using an instance size I don’t > believe we tried (too little RAM to be viable in production). c3.2xl only > gives you 15G of ram – most “performance” based systems want 2-4x that > (people running G1 heaps usually start at 16G heaps and leave another > 16-30G for page cache), you’re running fairly small hardware – it’s > possible that 2.1 isn’t “as good” on smaller hardware. > > (I do see your domain, presumably you know all of this, but just to be > sure): > > You’re using c3, so presumably you’re using EBS – are you using GP2? Which > volume sizes? Are they the same between versions? Are you hitting your iops > limits? Running out of burst tokens? Do you have enhanced networking > enabled? At load, what part of your system is stressed? Are you cpu bound? > Are you seeing GC pauses hurt latency? Have you tried changing > memtable_allocation_type -> offheap objects (available in 2.1, not in > 2.0)? > > Tuning gc_grace is weird – do you understand what it does? Are you > overwriting or deleting a lot of data in your test (that’d be unusual)? Are > you doing a lot of compaction? > > > From: "Peddi, Praveen" > Reply-To: "user@cassandra.apache.org" > Date: Wednesday, January 6, 2016 at 11:41 AM > To: "user@cassandra.apache.org" > Subject: Slow performance after upgrading from 2.0.9 to 2.1.11 > > Hi, > We have upgraded Cassandra from 2.0.9 to 2.1.11 in our loadtest > environment with pretty much same yaml settings in both (removed unused > yaml settings and renamed few others) and we have noticed performance on > 2.1.11 is worse compared to 2.0.9. *After more investigation we found > that the performance gets worse as we increase replication factor on 2.1.11 > where as on 2.0.9 performance is more or less same.* Has anything > architecturally changed as far as replication is concerned in 2.1.11? > > All googling only suggested 2.1.11 should be FASTER than 2.0.9 so we are > obviously doing something different. However the client code, load test is > all identical in both cases. > > Details: > Nodes: 3 ec2 c3.2x large > R/W Consistency: QUORUM > Renamed memtable_total_space_in_mb to memtable_heap_space_in_mb and > removed unused properties from yaml file. > We run compaction aggressive compaction with low gc_grace (15 mins) but > this is true for both 2.0.9 and 2.1.11. > > As you can see, all p50, p90 and p99 latencies stayed with in 10% > difference on 2.0.9 when we increased RF from 1 to 3, where as on 2.1.11 > latencies almost doubled (especially reads are much slower than writes). > > # Nodes RF # of rows 2.0.9 2.1.11 READ P50 P90 P99 > P50 P90 P99 3 1 450 306 594 747 425 849 1085 3 3 450 358 634 877 708 1274 > 2642 WRITE 3 1 10 26 80 179 37 131 196 3 > 3 10 31 96 184 46 166 468 > Any pointers on how to debug performance issues will be appreciated. > > Praveen >