On Fri, Jul 6, 2012 at 4:47 AM, aaron morton <aa...@thelastpickle.com> wrote: > 12G Heap, > 1600Mb Young gen, > > Is a bit higher than the normal recommendation. 1600MB young gen can cause > some extra ParNew pauses. Thanks for heads up, i'll try tinkering on this
> > 128 Concurrent writer > threads > > Unless you are on SSD this is too many. > I mean http://www.datastax.com/docs/0.8/configuration/node_configuration#concurrent-writes , this is not memtable flush queue writers. Suggested value is 8*number of cores(16) = 128 itself. > > 1) Is using JDK 1.7 any way detrimental to cassandra? > > as far as I know it's not fully certified, thanks for trying it :) > > 2) What is the max write operation qps that should be expected. Is the > netflix benchmark also applicable for counter incrmenting tasks? > > Counters use a different write path than normal writes and are a bit slower. > > To benchmark, get a single node and work out the max throughput. Then > multiply by the number of nodes and divide by the RF to get a rough idea. > > the cpu > idle time is around 30%, cassandra is not disk bound(insignificant > read operations and cpu's iowait is around 0.05%) > > Wait until compaction kicks in and handle all your inserts. > > The os load is around 16-20 and the average write latency is 3ms. > tpstats do not show any significant pending tasks. > > The node is overloaded. What is the write latency for a single thread doing > as single increment against a node that has not other traffic ? The latency > for a request is the time spent working and the time spent waiting, once you > read the max throughput the time spent waiting increases. The SEDA > architecture is designed to limit the time spent working. > > At this point suddenly, Several nodes start dropping several > "Mutation" messages. There are also lots of pending > > The cluster is overwhelmed. > > Almost all the new threads seem to be named > "pool-2-thread-*". > > These are client connection threads. > > My guess is that this might be due to the 128 Writer threads not being > able to perform more writes.( > > Yes. > https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L214 > > Work out the latency for a single client single node, then start adding > replication, nodes and load. When the latency increases you are getting to > the max throughput for that config. Also, as mentioned in my second mail, seeing messages like this "Total time for which application threads were stopped: 16.7663710 seconds", if something pauses for this long, it might be overwhelmed by the hints stored at other nodes. This can further cause the node to wait on/drop a lot of client connection threads. I'll look into what is causing these non-gc pauses. Thanks for the help. > > Hope that helps > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 5/07/2012, at 6:49 PM, rohit bhatia wrote: > > Our Cassandra cluster consists of 8 nodes(16 core, 32G ram, 12G Heap, > 1600Mb Young gen, cassandra1.0.5, JDK 1.7, 128 Concurrent writer > threads). The replication factor is 2 with 10 column families and we > service Counter incrementing write intensive tasks(CL=ONE). > > I am trying to figure out the bottleneck, > > 1) Is using JDK 1.7 any way detrimental to cassandra? > > 2) What is the max write operation qps that should be expected. Is the > netflix benchmark also applicable for counter incrmenting tasks? > > http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html > > 3) At around 50,000qps for the cluster (~12500 qps per node), the cpu > idle time is around 30%, cassandra is not disk bound(insignificant > read operations and cpu's iowait is around 0.05%) and is not swapping > its memory(around 15 gb RAM is free or inactive). The average gc pause > time for parnew are 100ms occuring every second. So cassandra spends > 10% of its time stuck in "Stop the world" collector. > The os load is around 16-20 and the average write latency is 3ms. > tpstats do not show any significant pending tasks. > > At this point suddenly, Several nodes start dropping several > "Mutation" messages. There are also lots of pending > MutationStage,replicateOnWriteStage tasks in tpstats. > The number of threads in the java process increase to around 25,000 > from the usual 300-400. Almost all the new threads seem to be named > "pool-2-thread-*". > The OS load jumps to around 30-40, the "write request latency" starts > spiking to more than 500ms (even to several tens of seconds sometime). > Even the "Local write latency" increases fourfolds to 200 microseconds > from 50 microseconds. This happens across all the nodes and in around > 2-3 minutes. > My guess is that this might be due to the 128 Writer threads not being > able to perform more writes.(though with average local write latency > of 100-150 micro seconds, each thread should be able to serve 10,000 > qps and with 128 writer threads, should be able to serve 1,280,000 qps > per node) > Could there be any other reason for this? What else should I monitor > since system.log do not seem to say anything conclusive before > dropping messages. > > > > Thanks > Rohit > >