Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Anurag Khandelwal
Hi Jack, > So, your 1GB input size means roughly 716 thousand rows of data and 128GB > means roughly 92 million rows, correct? Yes, that's correct. > Are your gets and searches returning single rows, or a significant number of > rows? Like I mentioned in my first email, get always returns a s

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Jack Krupansky
Thanks for that clarification. So, your 1GB input size means roughly 716 thousand rows of data and 128GB means roughly 92 million rows, correct? FWIW, a best practice recommendation is that you avoid using secondary indexes in favor of using "query tables" - store the same data in multiple tables

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Anurag Khandelwal
To clarify: Input size is the size of the dataset as a CSV file, before loading it into Cassandra; for each input size, the number of columns is fixed but the number of rows is different. By 1.5KB record, I meant that each row, when represented as a CSV entry, occupies 1500 bytes. I've used the

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Jack Krupansky
What exactly is "input size" here (1GB to 128GB)? I mean, the test spec "The dataset used comprises of ~1.5KB records... there are 105 attributes in each record." Does each test run have exactly the same number of rows and columns and you're just making each column bigger, or what? Cassandra does

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Jonathan Haddad
I think you actually get a really useful metric by benchmarking 1 machine. You understand your cluster's theoretical maximum performance, which would be Nodes * number of queries. Yes, adding in replication and CL is important, but 1 machine lets you isolate certain performance metrics. On Thu, J

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Robert Wille
I disagree. I think that you can extrapolate very little information about RF>1 and CL>1 by benchmarking with RF=1 and CL=1. On Jan 13, 2016, at 8:41 PM, Anurag Khandelwal mailto:anur...@berkeley.edu>> wrote: Hi John, Thanks for responding! The aim of this benchmark was not to benchmark Cassa

Re: Cassandra Performance on a Single Machine

2016-01-13 Thread Anurag Khandelwal
Hi John, Thanks for responding! The aim of this benchmark was not to benchmark Cassandra as an end-to-end distributed system, but to understand a break down of the performance. For instance, if we understand the performance characteristics that we can expect from a single machine cassandra ins

Re: Cassandra Performance on a Single Machine

2016-01-06 Thread John Schulz
Anurag, Unless you are planning on continuing to use only one machine with RF=1 benchmarking a single system using RF=Consistancy=1 is mostly a waste of time. If you are going to use RF=1 and a single host then why use Cassandra at all. Plain old relational dbs should do the job just fine. Cassan