<unverified-conjecture> I'm not surprised that when you profile cassandra you are seeing some lock contention, particularly given its SEDA architecture, as there is a lot of waiting that threads end up doing while requests make their way through the various stages.
See https://wiki.apache.org/cassandra/ArchitectureInternals https://issues.apache.org/jira/browse/CASSANDRA-10989 https://blogs.oracle.com/roland/entry/real_time_java_and_futexes </unverified-conjecture> So I would say the thread_wait issue is a red herring in this case given it will be inherent for most Cassandra deployments... the caveat is that you are running 3.2.1 which is a thoroughly new version of Cassandra that may have a new bug and I'm not sure how many people here have experience with it. Especially given that the new tick-tock approach makes it hard to judge when a release is ready for prime time. Otherwise follow the good folk at crowdstrike for getting good performance out of EBS ( http://www.slideshare.net/jimplush/1-million-writes-per-second-on-60-nodes-with-cassandra-and-ebs). They have done all the hard work for the rest of us. Reduce your JVM heap size to something closer to 8GB, given that your cluster hasn't seen a production workload I wouldn't worry about tuning heap etc unless you see GC pressure in the logs. You don't want to spend a lot of time tuning for backloading when the actual traffic will be / could be different. The performance you are getting is roughly on par to what we have seen with some early benchmarking of EBS volumes ( https://www.instaclustr.com/2015/10/28/cassandra-on-aws-ebs-infrastructure/), but with machines half the size. We decided to go a slightly different path and use m4.xlarges we are always playing with different configurations to see what works best. On Sat, 6 Feb 2016 at 16:50 Will Hayworth <whaywo...@atlassian.com> wrote: > Additionally: this isn't the futex_wait bug (or at least it shouldn't > be?). Amazon says > <https://forums.aws.amazon.com/thread.jspa?messageID=623731> that was > fixed several kernel versions before mine, which > is 4.1.10-17.31.amzn1.x86_64. And the reason my heap is so large is > because, per CASSANDRA-9472, we can't use offheap until 3.4 is released. > > Will > > ___________________________________________________________ > Will Hayworth > Developer, Engagement Engine > Atlassian > > My pronoun is "they". <http://pronoun.is/they> > > > > On Sat, Feb 6, 2016 at 3:28 PM, Will Hayworth <whaywo...@atlassian.com> > wrote: > >> *tl;dr: other than CAS operations, what are the potential sources of lock >> contention in C*?* >> >> Hi all! :) I'm a novice Cassandra and Linux admin who's been preparing a >> small cluster for production, and I've been seeing something weird. For >> background: I'm running 3.2.1 on a cluster of 12 EC2 m4.2xlarges (32 GB >> RAM, 8 HT cores) backed by 3.5 TB GP2 EBS volumes. Until late yesterday, >> that was a cluster of 12 m4.xlarges with 3 TB volumes. I bumped it because >> while backloading historical data I had been seeing awful throughput (20K >> op/s at CL.ONE). I'd read through Al Tobey's *amazing* C* tuning guide >> <https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html> once >> or twice before but this time I was careful and fixed a bunch of defaults >> that just weren't right, in cassandra.yaml/JVM options/block device >> parameters. Folks on IRC were super helpful as always (hat tip to Jeff >> Jirsa in particular) and pointed out, for example, that I shouldn't be >> using DTCS for loading historical data--heh. After changing to LTCS, >> unbatching my writes* and reserving a CPU core for interrupts and fixing >> the clocksource to TSC, I finally hit 80K early this morning. Hooray! :) >> >> Now, my question: I'm still seeing a *ton* of blocked processes in the >> vmstats, anything from 2 to 9 per 10 second sample period--and this is >> before EBS is even being hit! I've been trying in vain to figure out what >> this could be--GC seems very quiet, after all. On Al's page's advice, I've >> been running strace and, indeed, I've been seeing *tens of thousands of >> futex() calls* in periods of 10 or 20 seconds. What eludes me is *where* this >> lock contention is coming from. I'm not using LWTs or performing CAS >> operations of which I'm aware. Assuming this isn't a red herring, what >> gives? >> >> Sorry for the essay--I just wanted to err on the side of more >> context--and *thank you* for any advice you'd like to offer, >> Will >> >> P.S. More background if you'd like--I'm running on Amazon Linux 2015.09, >> using jemalloc 3.6, JDK 1.8.0_65-b17. Here <http://pastebin.com/kuhBmHXG> is >> my cassandra.yaml and here <http://pastebin.com/fyXeTfRa> are my JVM >> args. I realized I neglected to adjust memtable_flush_writers as I was >> writing this--so I'll get on that. Aside from that, I'm not sure what to >> do. (Thanks, again, for reading.) >> >> * They were batched for consistency--I'm hoping to return to using them >> when I'm back at normal load, which is tiny compared to backloading, but >> the impact on performance was eye-opening. >> ___________________________________________________________ >> Will Hayworth >> Developer, Engagement Engine >> Atlassian >> >> My pronoun is "they". <http://pronoun.is/they> >> >> >> > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer