> 12G Heap, > 1600Mb Young gen, Is a bit higher than the normal recommendation. 1600MB young gen can cause some extra ParNew pauses.
> 128 Concurrent writer > threads Unless you are on SSD this is too many. > 1) Is using JDK 1.7 any way detrimental to cassandra? as far as I know it's not fully certified, thanks for trying it :) > 2) What is the max write operation qps that should be expected. Is the > netflix benchmark also applicable for counter incrmenting tasks? Counters use a different write path than normal writes and are a bit slower. To benchmark, get a single node and work out the max throughput. Then multiply by the number of nodes and divide by the RF to get a rough idea. > the cpu > idle time is around 30%, cassandra is not disk bound(insignificant > read operations and cpu's iowait is around 0.05%) Wait until compaction kicks in and handle all your inserts. > The os load is around 16-20 and the average write latency is 3ms. > tpstats do not show any significant pending tasks. The node is overloaded. What is the write latency for a single thread doing as single increment against a node that has not other traffic ? The latency for a request is the time spent working and the time spent waiting, once you read the max throughput the time spent waiting increases. The SEDA architecture is designed to limit the time spent working. > At this point suddenly, Several nodes start dropping several > "Mutation" messages. There are also lots of pending The cluster is overwhelmed. > Almost all the new threads seem to be named > "pool-2-thread-*". These are client connection threads. > My guess is that this might be due to the 128 Writer threads not being > able to perform more writes.( Yes. https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L214 Work out the latency for a single client single node, then start adding replication, nodes and load. When the latency increases you are getting to the max throughput for that config. Hope that helps ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/07/2012, at 6:49 PM, rohit bhatia wrote: > Our Cassandra cluster consists of 8 nodes(16 core, 32G ram, 12G Heap, > 1600Mb Young gen, cassandra1.0.5, JDK 1.7, 128 Concurrent writer > threads). The replication factor is 2 with 10 column families and we > service Counter incrementing write intensive tasks(CL=ONE). > > I am trying to figure out the bottleneck, > > 1) Is using JDK 1.7 any way detrimental to cassandra? > > 2) What is the max write operation qps that should be expected. Is the > netflix benchmark also applicable for counter incrmenting tasks? > > http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html > > 3) At around 50,000qps for the cluster (~12500 qps per node), the cpu > idle time is around 30%, cassandra is not disk bound(insignificant > read operations and cpu's iowait is around 0.05%) and is not swapping > its memory(around 15 gb RAM is free or inactive). The average gc pause > time for parnew are 100ms occuring every second. So cassandra spends > 10% of its time stuck in "Stop the world" collector. > The os load is around 16-20 and the average write latency is 3ms. > tpstats do not show any significant pending tasks. > > At this point suddenly, Several nodes start dropping several > "Mutation" messages. There are also lots of pending > MutationStage,replicateOnWriteStage tasks in tpstats. > The number of threads in the java process increase to around 25,000 > from the usual 300-400. Almost all the new threads seem to be named > "pool-2-thread-*". > The OS load jumps to around 30-40, the "write request latency" starts > spiking to more than 500ms (even to several tens of seconds sometime). > Even the "Local write latency" increases fourfolds to 200 microseconds > from 50 microseconds. This happens across all the nodes and in around > 2-3 minutes. > My guess is that this might be due to the 128 Writer threads not being > able to perform more writes.(though with average local write latency > of 100-150 micro seconds, each thread should be able to serve 10,000 > qps and with 128 writer threads, should be able to serve 1,280,000 qps > per node) > Could there be any other reason for this? What else should I monitor > since system.log do not seem to say anything conclusive before > dropping messages. > > > > Thanks > Rohit