I think you actually get a really useful metric by benchmarking 1 machine. You understand your cluster's theoretical maximum performance, which would be Nodes * number of queries. Yes, adding in replication and CL is important, but 1 machine lets you isolate certain performance metrics.
On Thu, Jan 14, 2016 at 12:23 PM Robert Wille <rwi...@fold3.com> wrote: > I disagree. I think that you can extrapolate very little information about > RF>1 and CL>1 by benchmarking with RF=1 and CL=1. > > On Jan 13, 2016, at 8:41 PM, Anurag Khandelwal <anur...@berkeley.edu> > wrote: > > Hi John, > > Thanks for responding! > > The aim of this benchmark was not to benchmark Cassandra as an end-to-end > distributed system, but to understand a break down of the performance. For > instance, if we understand the performance characteristics that we can > expect from a single machine cassandra instance with RF=Consistency=1, we > can have a good estimate of what the distributed performance with higher > replication factors and consistency are going to look like. Even in the > ideal case, the performance improvement would scale at most linearly with > more machines and replicas. > > That being said, I still want to understand whether this is the > performance I should expect for the setup I described; if the performance > for the current setup can be improved, then clearly the performance for a > production setup (with multiple nodes, replicas) would also improve. Does > that make sense? > > Thanks! > Anurag > > On Jan 6, 2016, at 9:31 AM, John Schulz <sch...@pythian.com> wrote: > > Anurag, > > Unless you are planning on continuing to use only one machine with RF=1 > benchmarking a single system using RF=Consistancy=1 is mostly a waste of > time. If you are going to use RF=1 and a single host then why use Cassandra > at all. Plain old relational dbs should do the job just fine. > > Cassandra is designed to be distributed. You won't get the full impact of > how it scales and the limits on scaling unless you benchmark a distributed > system. For example the scaling impact of secondary indexes will not be > visible on a single node. > > John > > > > > On Tue, Jan 5, 2016 at 3:16 PM, Anurag Khandelwal <anur...@berkeley.edu> > wrote: > >> Hi, >> >> I’ve been benchmarking Cassandra to get an idea of how the performance >> scales with more data on a single machine. I just wanted to get some >> feedback to whether these are the numbers I should expect. >> >> The benchmarks are quite simple — I measure the latency and throughput >> for two kinds of queries: >> >> 1. get() queries - These fetch an entire row for a given primary key. >> 2. search() queries - These fetch all the primary keys for rows where a >> particular column matches a particular value (e.g., “name” is “John >> Smith”). >> >> Indexes are constructed for all columns that are queried. >> >> *Dataset* >> >> The dataset used comprises of ~1.5KB records (on an average) when >> represented as CSV; there are 105 attributes in each record. >> >> *Queries* >> >> For get() queries, randomly generated primary keys are used. >> >> For search() queries, column values are selected such that their total >> number of occurrences in the dataset is between 1 - 4000. For example, a >> query for “name” = “John Smith” would only be performed if the number of >> rows that contain the same lies between 1-4000. >> >> The results for the benchmarks are provided below: >> >> *Latency Measurements* >> >> The latency measurements are an average of 10000 queries. >> >> >> >> >> >> *Throughput Measurements* >> >> The throughput measurements were repeated for 1-16 client threads, and >> the numbers reported for each input size is for the configuration (i.e., # >> client threads) with the highest throughput. >> >> >> >> >> >> Any feedback here would be greatly appreciated! >> >> Thanks! >> Anurag >> >> > > > -- > > John H. Schulz > > Principal Consultant > > Pythian - Love your data > > > sch...@pythian.com | Linkedin > www.linkedin.com/pub/john-schulz/13/ab2/930/ > > Mobile: 248-376-3380 > > *www.pythian.com <http://www.pythian.com/>* > > -- > > > > > > >