Re: High RecentWriteLatencyMicro

aaron morton Mon, 16 Jul 2012 02:58:11 -0700

The write path for counters is different than non counter fields, for 
background 
http://www.datastax.com/wp-content/uploads/2011/07/cassandra_sf_counters.pdf
 
The write is applied on the leader *and then* replicated to the other replicas. 
This was controlled by a config setting called replicate_on_write which IIRC 
has been removed because you always want to do this. You can see this traffic 
in the REPLICATE_ON_WRITE thread pool.


Have a look at the ROW stage and see it backing up. 

> 1) Is the whole of 7-8ms being spent in thrift overheads and
> Scheduling delays ? (there is insignificant .1ms ping time between
> machines)
The storage proxy / jmx latency is the total latency for the coordinator after 
the thrift deserialisation (and before serialising the response).  7 to 8 ms 
sounds a little high considering the low local node latency. But it would make 
sense if the nodes were at peak throughput. At max throughput request latency 
is wait time + processing time. 

What happens to node local latency and cluster latency when the throughput goes 
down?

Also this will be responsible for some of that latency…
> (GC
> stops threads for 100ms every 1-2 seconds, effectively pausing
> cassandra 5-10% of its time, but this doesn't seem to be the reason)


> 2) Do keeping a large number of CF(17 in our case) adversely affect
> write performance? (except from the extreme flushing scenario)
Should be fine with 17

> 3) I see a lot of threads(4,000-10,000) with names like
> "pool-2-thread-*" 
These are connection threads. Use connecting pooling or try the thread pooled 
connection manager, see yaml for details. 

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 13/07/2012, at 3:48 PM, rohit bhatia wrote:

> Hi
> 
> As I understand that writes in cassandra are directly pushed to memory
> and using counters with CL.ONE shouldn't take the read latency for
> counters in account. So Writes for incrementing counters with CL.ONE
> should basically be really fast.
> 
> But in my 8 node cluster(16 core/32G ram/cassandra1.0.5/java7 each)
> with RF=2, At a traffic of 55k qps = 14k increments per node/7k write
> requests per node, the write latency(from jmx) increases to around 7-8
> ms from the low traffic value of 0.5ms.  The Nodes aren't even pushed
> with absent I/O, lots of free RAM and 30% CPU idle time/OS Load 20.
> The write latency by cfstats (supposedly the latency for 1 node to
> increment its counter) is a small amount (< 0.05ms).
> 
> 1) Is the whole of 7-8ms being spent in thrift overheads and
> Scheduling delays ? (there is insignificant .1ms ping time between
> machines)
> 
> 2) Do keeping a large number of CF(17 in our case) adversely affect
> write performance? (except from the extreme flushing scenario)
> 
> 3) I see a lot of threads(4,000-10,000) with names like
> "pool-2-thread-*" (pointed out as client-connection-threads on the
> mailing list before) periodically forming up. but with idle cpu time
> and zero pending tasks in tpstats, why do requests keep piling up (GC
> stops threads for 100ms every 1-2 seconds, effectively pausing
> cassandra 5-10% of its time, but this doesn't seem to be the reason)
> 
> Thanks
> Rohit

Re: High RecentWriteLatencyMicro

Reply via email to