Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

Eric Stevens Sun, 07 Dec 2014 07:57:54 -0800

Hi Joy,

Are you resetting your data after each test run?  I wonder if your tests
are actually causing you to fall behind on data grooming tasks such as
compaction, and so performance suffers for your later tests.

There are *so many* factors which can affect performance, without reviewing
test methodology in great detail, it's really hard to say whether there are
flaws which might uncover an antipattern cause atypical number of cache
hits or misses, and so forth. You may also be producing gc pressure in the
write path, and so forth.

I *can* say that 28k writes per second looks just a little low, but it
depends a lot on your network, hardware, and write patterns (eg, data
size).  For a little performance test suite I wrote, with parallel batched
writes, on a 3 node rf=3 cluster test cluster, I got about 86k writes per
second.

Also focusing exclusively on max latency is going to cause you some
troubles especially in the case of magnetic media as you're using.  Between
ill-timed GC and inconsistent performance characteristics from magnetic
media, your max numbers will often look significantly worse than your p(99)
or p(999) numbers.

All this said, one node will often look better than several nodes for
certain patterns because it completely eliminates proxy (coordinator) write
times.  All writes are local writes.  It's an over-simple case that doesn't
reflect any practical production use of Cassandra, so it's probably not
worth even including in your tests.  I would recommend start at 3 nodes
rf=3, and compare against 6 nodes rf=6.  Make sure you're staying on top of
compaction and aren't seeing garbage collections in the logs (either of
those will be polluting your results with variability you can't account for
with small sample sizes of ~1 million).

If you expect to sustain write volumes like this, you'll find these
clusters are sized too small (on that hardware you won't keep up with
compaction), and your tests are again testing scenarios you wouldn't
actually see in production.

On Sat Dec 06 2014 at 7:09:18 AM kong <kongjiali...@gmail.com> wrote:

> Hi,
>
> I am doing stress test on Datastax Cassandra Community 2.1.2, not using
> the provided stress test tool, but use my own stress-test client code
> instead(I write some C++ stress test code). My Cassandra cluster is
> deployed on Amazon EC2, using the provided Datastax Community AMI( HVM
> instances ) in the Datastax document, and I am not using EBS, just using
> the ephemeral storage by default. The EC2 type of Cassandra servers are
> m3.xlarge. I use another EC2 instance for my stress test client, which is
> of type r3.8xlarge. Both the Cassandra sever nodes and stress test client
> node are in us-east. I test the Cassandra cluster which is made up of 1
> node, 2 nodes, and 4 nodes separately. I just do INSERT test and SELECT
> test separately, but the performance doesn’t get linear increment when new
> nodes are added. Also I get some weird results. My test results are as
> follows(*I do 1 million operations and I try to get the best QPS when the
> max latency is no more than 200ms, and the latencies are measured from the
> client side. The QPS is calculated by total_operations/total_time).*
>
>
>
> *INSERT(write):*
>
> Node count
>
> Replication factor
>
>   QPS
>
> Average latency(ms)
>
> Min latency(ms)
>
> .95 latency(ms)
>
> .99 latency(ms)
>
> .999 latency(ms)
>
> Max latency(ms)
>
> 1
>
> 1
>
> 18687
>
> 2.08
>
> 1.48
>
> 2.95
>
> 5.74
>
> 52.8
>
> 205.4
>
> 2
>
> 1
>
> 20793
>
> 3.15
>
> 0.84
>
> 7.71
>
> 41.35
>
> 88.7
>
> 232.7
>
> 2
>
> 2
>
> 22498
>
> 3.37
>
> 0.86
>
> 6.04
>
> 36.1
>
> 221.5
>
> 649.3
>
> 4
>
> 1
>
> 28348
>
> 4.38
>
> 0.85
>
> 8.19
>
> 64.51
>
> 169.4
>
> 251.9
>
> 4
>
> 3
>
> 28631
>
> 5.22
>
> 0.87
>
> 18.68
>
> 68.35
>
> 167.2
>
> 288
>
>
>
> *SELECT(read):*
>
> Node count
>
> Replication factor
>
> QPS
>
> Average latency(ms)
>
> Min latency(ms)
>
> .95 latency(ms)
>
> .99 latency(ms)
>
> .999 latency(ms)
>
> Max latency(ms)
>
> 1
>
> 1
>
> 24498
>
> 4.01
>
> 1.51
>
> 7.6
>
> 12.51
>
> 31.5
>
> 129.6
>
> 2
>
> 1
>
> 28219
>
> 3.38
>
> 0.85
>
> 9.5
>
> 17.71
>
> 39.2
>
> 152.2
>
> 2
>
> 2
>
> 35383
>
> 4.06
>
> 0.87
>
> 9.71
>
> 21.25
>
> 70.3
>
> 215.9
>
> 4
>
> 1
>
> 34648
>
> 2.78
>
> 0.86
>
> 6.07
>
> 14.94
>
> 30.8
>
> 134.6
>
> 4
>
> 3
>
> 52932
>
> 3.45
>
> 0.86
>
> 10.81
>
> 21.05
>
> 37.4
>
> 189.1
>
>
>
> The test data I use is generated randomly, and the schema I use is like (I
> use the cqlsh to create the columnfamily/table):
>
> CREATE TABLE table(
>
> id1  varchar,
>
> ts   varchar,
>
> id2  varchar,
>
> msg  varchar,
>
> PRIMARY KEY(id1, ts, id2));
>
> So the fields are all string and I generate each character of the string
> randomly, using srand(time(0)) and rand() in C++, so I think my test data
> could be uniformly distributed into the Cassandra cluster. And, in my
> client stress test code, I use thrift C++ interface, and the basic
> operation I do is like:
>
> thrift_client.execute_cql3_query(“INSERT INTO table WHERE id1=xxx, ts=xxx,
> id2=xxx, msg=xxx”); and thrift_client.execute_cql3_query(“SELECT FROM table
> WHERE id1=xxx”);
>
> Each data entry I INSERT of SELECT is of around 100 characters.
>
> On my stress test client, I create several threads to send the read and
> write requests, each thread having its own thrift client, and at the
> beginning all the thrift clients connect to the Cassandra servers evenly.
> For example, I create 160 thrift clients, and each 40 clients of them
> connect to one server node, in a 4 node cluster.
>
>
>
> *So, *
>
> *1.       **Could anyone help me explain my test results? Why does the
> performance ( QPS ) just get a little increment when new nodes are added? *
>
> *2.       **I learn from the materials that, Cassandra has better write
> performance than read. But why in my case the read performance is better?*
>
> *3.       **I also use the OpsCenter to monitor the real-time performance
> of my cluster. But when I get the average QPS above, the operations/s
> provided by OpsCenter is around 10000+ for write peak and 5000+ for read
> peak.  Why is my result inconsistent with that from OpsCenter?*
>
> *4.       **Are there any unreasonable things in my test method, such as
> test data and QPS calculation?*
>
>
>
> *Thank you very much,*
>
> *Joy*
>

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

Reply via email to