Thanks a lot for your reply guys. I was trying fsyn = batch and window =0ms to see if the disk utilization is happening full on my drive. I checked the numbers using iostat the numbers were around 60% and the CPU usage was also not too high.
Configuration of my Setup :- I have three m1.xlarge hosts each having 15 GB RAM and 4 CPU. It has 8 EC2 Compute Units. I have kept the replication factor equal to 3. The typical write size is 1 KB. I tried adding different nodes each with 200 threads and the throughput got split into two. If i do it from a single host with FSync Set to Periodic and Window Size equal to 1000ms and using two nodes i am getting these numbers :- [OVERALL], Throughput(ops/sec), 4771 [INSERT], AverageLatency(us), 18747 [INSERT], MinLatency(us), 1470 [INSERT], MaxLatency(us), 446413 [INSERT], 95thPercentileLatency(ms), 55 [INSERT], 99thPercentileLatency(ms), 167 [OVERALL], Throughput(ops/sec), 4678 [INSERT], AverageLatency(us), 22015 [INSERT], MinLatency(us), 1439 [INSERT], MaxLatency(us), 466149 [INSERT], 95thPercentileLatency(ms), 62 [INSERT], 99thPercentileLatency(ms), 171 Is there something i am doing wrong in cassandra Setup ?? What is the bet Setup for Cassandra to get high throughput and good write latency numbers ? On Tue, Jul 17, 2012 at 7:02 AM, Sylvain Lebresne <sylv...@datastax.com>wrote: > FSync = Batch and Window = 0ms is expected to give relatively crappy > result. It means C* will fsync on disk pretty much all write. This is an > overly safe setting and no database with that kind of setting will perform > correctly because you're far too much bound by the hard drive. > > If you want strong local durability, use Batch (so that C* never ack a > non-fsynced write) but keep a bigger window. And in any case, Periodic will > give you better results and provided you use a replication factor > 1, it > is good enough in 99% of the case. > > As for the exact numbers, you didn't even say what kind of instance you > are using, nor the replication factor, nor the typical size of each write, > so it's hard to tell you if it seems reasonable or not. > > As for the scalability, as horschi said, it's about adding nodes, not > adding clients. > > -- > Sylvain > > > On Tue, Jul 17, 2012 at 3:43 PM, horschi <hors...@gmail.com> wrote: > >> When they say "linear scalibility" they mean "throughput scales with the >> amount of machines in your cluster". >> >> Try adding more machines to your cluster and measure the thoughput. I'm >> pretty sure you'll see linear scalibility. >> >> regards, >> Christian >> >> >> >> On Tue, Jul 17, 2012 at 6:13 AM, Code Box <codeith...@gmail.com> wrote: >> >>> I am doing Cassandra Benchmarking using YCSB for evaluating the best >>> performance for my application which will be both read and write intensive. >>> I have set up a three cluster environment on EC2 and i am using YCSB in the >>> same availability region as a client. I have tried various combinations of >>> tuning cassandra parameters like FSync ( Setting to batch and periodic ), >>> Increasing the number of rpc_threads, increasing number of concurrent reads >>> and concurrent writes, write consistency one and Quorum i am not getting >>> very great results and also i do not see a linear graph in terms of >>> scalability that is if i increase the number of clients i do not see an >>> increase in the throughput. >>> >>> Here are some sample numbers that i got :- >>> >>> *Test 1:- Write Consistency set to Quorum Write Proportion = 100%. >>> FSync = Batch and Window = 0ms* >>> >>> ThreadsThroughput ( write per sec ) Avg Latency (ms)TP95(ms) TP99(ms) >>> Min(ms)Max(ms) >>> >>> >>> 102149 3.1984 51.499291 100 4070 23.82870 2.2260 2004151 45.9657 >>> 130 1.71242 300419764.68 1154222.09 216 >>> >>> >>> If you look at the numbers the number of threads do not increase the >>> throughput. Also the latency values are not that great. I am using fsync >>> set to batch and with 0 ms window. >>> >>> *Test 2:- ** Write Consistency set to Quorum Write Proportion = 100%. >>> FSync = Periodic and Window = 1000 ms* >>> * >>> * >>> 1803 1.23712 1.012312.9Q 10015944 5.343925 1.21579.1Q 200196309.047 1970 >>> 1.17 1851Q >>> Are these numbers expected numbers or does Cassandra perform better ? Am >>> i missing something ? >>> >> >> >