Hi Jack,
> So, your 1GB input size means roughly 716 thousand rows of data and 128GB
> means roughly 92 million rows, correct?
Yes, that's correct.
> Are your gets and searches returning single rows, or a significant number of
> rows?
Like I mentioned in my first email, get always returns a s
Thanks for that clarification.
So, your 1GB input size means roughly 716 thousand rows of data and 128GB
means roughly 92 million rows, correct?
FWIW, a best practice recommendation is that you avoid using secondary
indexes in favor of using "query tables" - store the same data in multiple
tables
To clarify: Input size is the size of the dataset as a CSV file, before loading
it into Cassandra; for each input size, the number of columns is fixed but the
number of rows is different. By 1.5KB record, I meant that each row, when
represented as a CSV entry, occupies 1500 bytes. I've used the
What exactly is "input size" here (1GB to 128GB)? I mean, the test spec "The
dataset used comprises of ~1.5KB records... there are 105 attributes in
each record." Does each test run have exactly the same number of rows and
columns and you're just making each column bigger, or what?
Cassandra does
I think you actually get a really useful metric by benchmarking 1 machine.
You understand your cluster's theoretical maximum performance, which would
be Nodes * number of queries. Yes, adding in replication and CL is
important, but 1 machine lets you isolate certain performance metrics.
On Thu, J
I disagree. I think that you can extrapolate very little information about RF>1
and CL>1 by benchmarking with RF=1 and CL=1.
On Jan 13, 2016, at 8:41 PM, Anurag Khandelwal
mailto:anur...@berkeley.edu>> wrote:
Hi John,
Thanks for responding!
The aim of this benchmark was not to benchmark Cassa
Hi John,
Thanks for responding!
The aim of this benchmark was not to benchmark Cassandra as an end-to-end
distributed system, but to understand a break down of the performance. For
instance, if we understand the performance characteristics that we can expect
from a single machine cassandra ins
Anurag,
Unless you are planning on continuing to use only one machine with RF=1
benchmarking a single system using RF=Consistancy=1 is mostly a waste of
time. If you are going to use RF=1 and a single host then why use Cassandra
at all. Plain old relational dbs should do the job just fine.
Cassan