Re: Finding bottleneck of a cluster

rohit bhatia Thu, 05 Jul 2012 21:15:02 -0700

On Fri, Jul 6, 2012 at 4:47 AM, aaron morton <aa...@thelastpickle.com> wrote:
> 12G Heap,
> 1600Mb Young gen,
>
> Is a bit higher than the normal recommendation. 1600MB young gen can cause
> some extra ParNew pauses.
Thanks for heads up, i'll try tinkering on this


>
> 128 Concurrent writer
> threads
>
> Unless you are on SSD this is too many.
>
I mean 
http://www.datastax.com/docs/0.8/configuration/node_configuration#concurrent-writes
, this is not memtable flush queue writers.
Suggested value is 8*number of cores(16) = 128 itself.
>
> 1) Is using JDK 1.7 any way detrimental to cassandra?
>
> as far as I know it's not fully certified, thanks for trying it :)
>
> 2) What is the max write operation qps that should be expected. Is the
> netflix benchmark also applicable for counter incrmenting tasks?
>
> Counters use a different write path than normal writes and are a bit slower.
>
> To benchmark, get a single node and work out the max throughput. Then
> multiply by the number of nodes and divide by the RF to get a rough idea.
>
> the cpu
> idle time is around 30%, cassandra is not disk bound(insignificant
> read operations and cpu's iowait is around 0.05%)
>
> Wait until compaction kicks in and handle all your inserts.
>
> The os load is around 16-20 and the average write latency is 3ms.
> tpstats do not show any significant pending tasks.
>
> The node is overloaded. What is the write latency for a single thread doing
> as single increment against a node that has not other traffic ? The latency
> for a request is the time spent working and the time spent waiting, once you
> read the max throughput the time spent waiting increases. The SEDA
> architecture is designed to limit the time spent working.
>
>    At this point suddenly, Several nodes start dropping several
> "Mutation" messages. There are also lots of pending
>
> The cluster is overwhelmed.
>
>  Almost all the new threads seem to be named
> "pool-2-thread-*".
>
> These are client connection threads.
>
> My guess is that this might be due to the 128 Writer threads not being
> able to perform more writes.(
>
> Yes.
> https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L214
>
> Work out the latency for a single client single node, then start adding
> replication, nodes and load. When the latency increases you are getting to
> the max throughput for that config.

Also, as mentioned in my second mail, seeing messages like this "Total
time for which application threads were stopped: 16.7663710 seconds",
if something pauses for this long, it might be overwhelmed by the
hints stored at other nodes. This can further cause the node to wait
on/drop a lot of client connection threads. I'll look into what is
causing these non-gc pauses. Thanks for the help.

>
> Hope that helps
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5/07/2012, at 6:49 PM, rohit bhatia wrote:
>
> Our Cassandra cluster consists of 8 nodes(16 core, 32G ram, 12G Heap,
> 1600Mb Young gen, cassandra1.0.5, JDK 1.7, 128 Concurrent writer
> threads). The replication factor is 2 with 10 column families and we
> service Counter incrementing write intensive tasks(CL=ONE).
>
> I am trying to figure out the bottleneck,
>
> 1) Is using JDK 1.7 any way detrimental to cassandra?
>
> 2) What is the max write operation qps that should be expected. Is the
> netflix benchmark also applicable for counter incrmenting tasks?
>
> http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
>
> 3) At around 50,000qps for the cluster (~12500 qps per node), the cpu
> idle time is around 30%, cassandra is not disk bound(insignificant
> read operations and cpu's iowait is around 0.05%) and is not swapping
> its memory(around 15 gb RAM is free or inactive). The average gc pause
> time for parnew are 100ms occuring every second. So cassandra spends
> 10% of its time stuck in "Stop the world" collector.
> The os load is around 16-20 and the average write latency is 3ms.
> tpstats do not show any significant pending tasks.
>
>    At this point suddenly, Several nodes start dropping several
> "Mutation" messages. There are also lots of pending
> MutationStage,replicateOnWriteStage tasks in tpstats.
> The number of threads in the java process increase to around 25,000
> from the usual 300-400. Almost all the new threads seem to be named
> "pool-2-thread-*".
> The OS load jumps to around 30-40, the "write request latency" starts
> spiking to more than 500ms (even to several tens of seconds sometime).
> Even the "Local write latency" increases fourfolds to 200 microseconds
> from 50 microseconds. This happens across all the nodes and in around
> 2-3 minutes.
> My guess is that this might be due to the 128 Writer threads not being
> able to perform more writes.(though with  average local write latency
> of 100-150 micro seconds, each thread should be able to serve 10,000
> qps and with 128 writer threads, should be able to serve 1,280,000 qps
> per node)
> Could there be any other reason for this? What else should I monitor
> since system.log do not seem to say anything conclusive before
> dropping messages.
>
>
>
> Thanks
> Rohit
>
>

Re: Finding bottleneck of a cluster

Reply via email to