Re: Cassandra Evaluation/ Benchmarking: Throughput not scaling as expected neither latency showing good numbers

Code Box Fri, 20 Jul 2012 23:02:19 -0700

Thanks for the suggestion. I was able to get better results tuning the GC
settings but still not that great. I was seeing reading the netflix blog
for the settings they have done and they have posted on blog. But i could
not get close to what they are saying.


http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html


On Thu, Jul 19, 2012 at 9:45 PM, aaron morton <aa...@thelastpickle.com>wrote:

> Three node cluster with replication factor of 3 gets me around 10 ms 100%
>> writes with consistency equal to ONE. The reads are really bad and they are
>> around 65ms.
>>
> Using CL ONE in that situation, with a test that runs in a tight loop, can
> result in the clients overloading the cluster.
>
> Every node is a replica, so a write at CL ONE only has to wait for the
> local not to ACK. It will then return to the client before the remote nodes
> ACK, which means the client can send another request very quickly. In
> normal operation this may not be an issue, but load tests that run in a
> tight loop do not generate normal traffic.
>
> A better approach is to work at QUOURM so that network latency slows down
> individual client threads. Or generating the traffic using the Poisson
> distribution. The new load test from twitter uses that
> https://github.com/twitter/iago/ or you can use numpy for python.
>
> Cheers
>
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/07/2012, at 11:29 PM, Manoj Mainali wrote:
>
> How kind of client are you using in YCSB? If you want to improve latency,
> try distributing the requests among nodes instead of stressing a single
> node, try host connection pooling instead of creating connection for each
> request. Check high level clients like hector or asyantax for use if you
> are not already using them. Some clients have ring aware request handling.
>
> You have a 3 nodes cluster and using a RF of three, that means all the
> node will get the data. What CL are you using for writes? Latency increases
> for strong CL.
>
> If you want to increase throughput, try increasing the number of clients.
> Of course, it doesnt mean that throughtput will always increase. My
> observation was that it will increase and after certain number of clients
> throughput decrease again.
>
> Regards,
> Manoj Mainali
>
>
> On Wednesday, July 18, 2012, Code Box wrote:
>
>> The cassandra stress tool gives me values around 2.5 milli seconds for
>> writing. The problem with the Cassandra Stress Tool is that it just gives
>> the average latency numbers and the average latency numbers that i am
>> getting are comparable in some cases. It is the 95 percentile and 99
>> percentile numbers are the ones that are bad. So it means that the 95% of
>> requests are really bad and the rest 5% are really good that makes the
>> average go down. I want to make sure that the 95% and 99% values are in one
>> digit milli seconds. I want them to be single digit because i have seen
>> people getting those numbers.
>>
>> This is my conclusion till now with all the investigations:-
>>
>> Three node cluster with replication factor of 3 gets me around 10 ms 100%
>> writes with consistency equal to ONE. The reads are really bad and they are
>> around 65ms.
>>
>> I thought that network is the issue so i moved the client on a local
>> machine. Client on the local machine with one node cluster gives me again
>> good average write latencies but the 99%ile and 95%ile are bad. I am
>> getting around 10 ms for write and 25 ms for read.
>>
>> Network Bandwidth between the client and server is 1 Gigabit/second. I
>> was able to at the max generate 25 K requests. So it could be the client is
>> the bottleneck. I am using YCSB. May be i should change my client to some
>> other.
>>
>> Throughput that i got from a client at the maximum local was 35K and
>> remote was 17K.
>>
>>
>> I can try these things now:-
>>
>> Use a different client and see how much numbers i get for 99% and 95%. I
>> am not sure if there is any client that gives me this detailed or i have to
>> write one of my own.
>>
>> Tweak some hard disk settings raid0 and xfs / ext4 and see if that helps.
>>
>> Could be a possibility that the cassandra 0.8 to 1.1 the 95% and 99%
>> numbers have gone down.  The throughput numbers have also gone down.
>>
>> Is there any other client that i can use except the cassandra stress tool
>> and YCSB  and what ever numbers i have got are they good ?
>>
>>
>> --Akshat Vig.
>>
>>
>>
>>
>> On Tue, Jul 17, 2012 at 9:22 PM, aaron morton <aa...@thelastpickle.com>wrote:
>>
>> I would benchmark a default installation, then start tweaking. That way
>> you can see if your changes result in improvements.
>>
>> To simplify things further try using the tools/stress utility in the
>> cassandra source distribution first. It's pretty simple to use.
>>
>> Add clients until you see the latency increase and tasks start to back up
>> in nodetool tpstats. If you see it report dropped messages it is over
>> loaded.
>>
>> Hope that helps.
>>
>>   -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 18/07/2012, at 4:48 AM, Code Box wrote:
>>
>> Thanks a lot for your reply guys. I was trying fsyn = batch and window
>> =0ms to see if the disk utilization is happening full on my drive. I
>> checked the  numbers using iostat the numbers were around 60% and the CPU
>> usage was also not too high.
>>
>> Configuration of my Setup :-
>>
>> I have three m1.xlarge hosts each having 15 GB RAM and 4 CPU. It has 8
>> EC2 Compute Units.
>> I have kept the replication factor equal to 3. The typical write size is
>> 1 KB.
>>
>> I tried adding different nodes each with 200 threads and the throughput
>> got split into two. If i do it from a single host with FSync Set to
>> Periodic and Window Size equal to 1000ms and using two nodes i am getting
>> these numbers :-
>>
>>
>> [OVERALL], Throughput(ops/sec), 4771
>> [INSERT], AverageLatency(us), 18747
>> [INSERT], MinLatency(us), 1470
>> [INSERT], MaxLatency(us), 446413
>> [INSERT], 95thPercentileLatency(ms), 55
>> [INSERT], 99thPercentileLatency(ms), 167
>>
>> [OVERALL], Throughput(ops/sec), 4678
>> [INSERT], AverageLatency(us), 22015
>> [INSERT], MinLatency(us), 1439
>> [INSERT], MaxLatency(us), 466149
>> [INSERT], 95thPercentileLatency(ms), 62
>> [INSERT], 99thPercentileLatency(ms), 171
>>
>> Is there something i am doing wrong in cassandra Setup ?? What is the bet
>> Setup for Cassandra to get high throughput and good write latency numbers ?
>>
>>
>>
>> On Tue, Jul 17, 2012 at 7:02 AM, Sylvain Lebresne <sylv...@datastax.com>
>>
>>
>

Re: Cassandra Evaluation/ Benchmarking: Throughput not scaling as expected neither latency showing good numbers

Reply via email to