Does anyone have any Ec2 benchmarks/experiences they can share? I am
trying to get a sense for what to expect from a production cluster on
Ec2 so that I can compare my application's performance against a sane
baseline. What I have done so far is:
1. Lunched a 4 node cluster of m1.xlarge instances in the same
availability zone using PyStratus
(https://github.com/digitalreasoning/PyStratus). Each node has the
following specs (according to Amazon):
15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform
2. Changed the default PyStratus directories in order to have commit
logs on the root partition and data files on ephemeral storage:
commitlog_directory: /var/cassandra-logs
data_file_directories: [/mnt/cassandra-data]
2. Gave each node 10GB of MAX_HEAP; 1GB HEAP_NEWSIZE in
conf/cassandra-env.sh
3. Ran `contrib/stress/bin/stress -d node1,..,node4 -n 10000000 -t 100`
on a separate m1.large instance:
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
...
9832712,7120,7120,0.004948514851485148,842
9907616,7490,7490,0.0043189949802413755,852
9978357,7074,7074,0.004560353967289125,863
10000000,2164,2164,0.004065933558194335,867
4. Truncated Keyspace1.Standard1:
# /usr/local/apache-cassandra/bin/cassandra-cli -host localhost -port 9160
Connected to: "Test Cluster" on x.x.x.x/9160
Welcome to cassandra CLI.
Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
[default@unknown] use Keyspace1;
Authenticated to keyspace: Keyspace1
[default@Keyspace1] truncate Standard1;
null
5. Expanded the cluster to 8 nodes using PyStratus and sanity checked
using nodetool:
# /usr/local/apache-cassandra/bin/nodetool -h localhost ring
Address Status State Load Owns Token
x.x.x.x Up Normal 1.3 GB 12.50%
21267647932558653966460912964485513216
x.x.x.x Up Normal 3.06 GB 12.50%
42535295865117307932921825928971026432
x.x.x.x Up Normal 1.16 GB 12.50%
63802943797675961899382738893456539648
x.x.x.x Up Normal 2.43 GB 12.50%
85070591730234615865843651857942052864
x.x.x.x Up Normal 1.22 GB 12.50%
106338239662793269832304564822427566080
x.x.x.x Up Normal 2.74 GB 12.50%
127605887595351923798765477786913079296
x.x.x.x Up Normal 1.22 GB 12.50%
148873535527910577765226390751398592512
x.x.x.x Up Normal 2.57 GB 12.50%
170141183460469231731687303715884105728
6. Ran `contrib/stress/bin/stress -d node1,..,node8 -n 10000000 -t 100`
on a separate m1.large instance again:
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
...
9880360,9649,9649,0.003210443956226165,720
9942718,6235,6235,0.003206934154398794,731
9997035,5431,5431,0.0032615939761032457,741
10000000,296,296,0.002660033726812816,742
In a nutshell, 4 nodes inserted at 11,534 writes/sec and 8 nodes
inserted at 13,477 writes/sec.
Those numbers seem a little low to me, but I don't have anything to
compare to. I'd like to hear others' opinions before I spin my wheels
with with number of nodes, threads, memtable, memory, and/or GC
settings. Cheers, Alex.