If you double your nodes, you should be doubling your webservers too(that is if you are trying to prove it scales linearly). We had to spend time finding the correct ratio for our application (it ended up being 19 webservers to 20 data nodes so now just assume 1 to 1…..you can use amazon to find that info for very cheap.
Dean From: Anand Somani <meatfor...@gmail.com<mailto:meatfor...@gmail.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Thursday, April 4, 2013 1:05 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Re: Linear scalability problems RF=3. On Thu, Apr 4, 2013 at 7:08 AM, Cem Cayiroglu <cayiro...@gmail.com<mailto:cayiro...@gmail.com>> wrote: What was the RF before adding nodes? Sent from my iPhone On 04 Apr 2013, at 15:12, Anand Somani <meatfor...@gmail.com<mailto:meatfor...@gmail.com>> wrote: We are using a single process with multiple threads, will look at client side delays. Thanks On Wed, Apr 3, 2013 at 9:30 AM, Tyler Hobbs <ty...@datastax.com<mailto:ty...@datastax.com>> wrote: If I had to guess, I would say that your client is the bottleneck, not the cluster. Are you inserting data with multiple threads or processes? On Wed, Apr 3, 2013 at 8:49 AM, Anand Somani <meatfor...@gmail.com<mailto:meatfor...@gmail.com>> wrote: Hi, I am running some tests trying to scale out our application from using a 3 node cluster to 6 node cluster. The thing I observed is that when using a 3 node cluster I was able to handle abt 41 req/second, so I added 3 more nodes thinking it should close to double, but instead it only goes upto bat 47 req/second!! I am doing something wrong and it is not obvious, so wanted some help in what stats could/should I monitor to tell me things like if a node has more requests or if the load distribution is not random enough? Note I am using direct thrift (old code base) and cassandra 1.1.6. The data model is for storing blobs (split across columns) and has around 6 CF, RF=3 and all operations are at quorum. Also at the end of the run nodetool ring reports the same data size. Thanks Anand -- Tyler Hobbs DataStax<http://datastax.com/>