RE: Cassandra periodically stops responding to write requests under load

James Lee Fri, 14 Jun 2013 10:12:11 -0700

Hi Rob,

Thanks for the reply.  To answer your questions below:


I'm using the following JVM:
java version "1.7.0_10"
Java(TM) SE Runtime Environment (build 1.7.0_10-b18)
Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)

Stock Cassandra version 1.2.2

Heap settings are as follows (the system has 48GB of physical RAM, with over 
2/3 of it unused while Cassandra is running): -Xms8192M -Xmx8192M -Xmn2048M

I don't see any GC logs around the time of the slowness.

For now I'm testing with a single node only.  I'm expecting to have multiple 
nodes with all data fully replicated (eg. 2 nodes, replication factor 2) so am 
testing with a single node to check performance in the case where all other 
nodes have failed.  I'd like to get away with as few nodes as possible 
ideally...

I've put the full commandline of the running java process below, in case any of 
it is relevant:
/usr/jdk1.7.0_10/bin/java -ea 
-javaagent:/dsc-cassandra-1.2.2/bin/../lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 
-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCondCardMark 
-Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true 
-cp 
/dsc-cassandra-1.2.2/bin/../conf:/dsc-cassandra-1.2.2/bin/../build/classes/main:/dsc-cassandra-1.2.2/bin/../build/classes/thrift:/dsc-cassandra-1.2.2/bin/../lib/antlr-3.2.jar:/dsc-cassandra-1.2.2/bin/../lib/apache-cassandra-1.2.2.jar:/dsc-cassandra-1.2.2/bin/../lib/apache-cassandra-clientutil-1.2.2.jar:/dsc-cassandra-1.2.2/bin/../lib/apache-cassandra-thrift-1.2.2.jar:/dsc-cassandra-1.2.2/bin/../lib/avro-1.4.0-fixes.jar:/dsc-cassandra-1.2.2/bin/../lib/avro-1.4.0-sources-fixes.jar:/dsc-cassandra-1.2.2/bin/../lib/commons-cli-1.1.jar:/dsc-cassandra-1.2.2/bin/../lib/commons-codec-1.2.jar:/dsc-cassandra-1.2.2/bin/../lib/commons-lang-2.6.jar:/dsc-cassandra-1.2.2/bin/../lib/compress-lzf-0.8.4.jar:/dsc-cassandra-1.2.2/bin/../lib/concurrentlinkedhashmap-lru-1.3.jar:/dsc-cassandra-1.2.2/bin/../lib/guava-13.0.1.jar:/dsc-cassandra-1.2.2/bin/../lib/high-scale-lib-1.1.2.jar:/dsc-cassandra-1.2.2/bin/../lib/jackson-core-asl-1.9.2.jar:/dsc-cassandra-1.2.2/bin/../lib/jackson-mapper-asl-1.9.2.jar:/dsc-cassandra-1.2.2/bin/../lib/jamm-0.2.5.jar:/dsc-cassandra-1.2.2/bin/../lib/jbcrypt-0.3m.jar:/dsc-cassandra-1.2.2/bin/../lib/jline-1.0.jar:/dsc-cassandra-1.2.2/bin/../lib/jna-3.4.0.jar:/dsc-cassandra-1.2.2/bin/../lib/json-simple-1.1.jar:/dsc-cassandra-1.2.2/bin/../lib/libthrift-0.7.0.jar:/dsc-cassandra-1.2.2/bin/../lib/log4j-1.2.16.jar:/dsc-cassandra-1.2.2/bin/../lib/lz4-1.1.0.jar:/dsc-cassandra-1.2.2/bin/../lib/metrics-core-2.0.3.jar:/dsc-cassandra-1.2.2/bin/../lib/netty-3.5.9.Final.jar:/dsc-cassandra-1.2.2/bin/../lib/servlet-api-2.5-20081211.jar:/dsc-cassandra-1.2.2/bin/../lib/slf4j-api-1.7.2.jar:/dsc-cassandra-1.2.2/bin/../lib/slf4j-log4j12-1.7.2.jar:/dsc-cassandra-1.2.2/bin/../lib/snakeyaml-1.6.jar:/dsc-cassandra-1.2.2/bin/../lib/snappy-java-1.0.4.1.jar:/dsc-cassandra-1.2.2/bin/../lib/snaptree-0.1.jar
 org.apache.cassandra.service.CassandraDaemon

Thanks,
James

-----Original Message-----
From: Robert Coli [mailto:rc...@eventbrite.com] 
Sent: 14 June 2013 17:17
To: user@cassandra.apache.org
Subject: Re: Cassandra periodically stops responding to write requests under 
load

On Fri, Jun 14, 2013 at 7:19 AM, James Lee <james....@metaswitch.com> wrote:
>  I'm seeing generally good performance, but with periods where the 
> Cassandra node entirely stops responding to write requests for several 
> seconds at a time.  I don't have much experience of Cassandra 
> performance tuning, and would very much appreciate some pointers on what I 
> can do to improve matters.

It is relatively common for a Cassandra node to become unresponsive for a few 
seconds when doing various things. However as one usually has multiple replicas 
for any given key, this transient unavailability does not meaningfully impact 
overall availability. Pausing for more-than-a-few seconds is relatively 
uncommon and probably does indicate either sub optimal configuration or 
excessive workload.

> -- I've used a RAID array for the data directory to improve write 
> performance.  This significantly reduces the length of the slow period 
> (from ~10s to ~2s), but doesn't eliminate it.  I've tried RAID10 and 
> RAID0 using varying number of drives, but there doesn't seem to be a 
> significant difference between the two.

Do you see disk saturation when you're flushing? This statement suggests that 
you might be..

> -- I've used multiple drives for the data directory, symlinking the 
> directories for different keyspaces to different drives.  That didn't 
> improve things significantly compared to using a single drive.

I would not expect this to improve things if you are bounded on how quickly you 
can flush from a single thread.

Stock questions :

1) What JVM?
2) What heap settings?
3) Do you also see GC logs around flush time?
4) Are you testing a single node only?

=Rob

RE: Cassandra periodically stops responding to write requests under load

Reply via email to