Pardon the long delay - went on holiday and got sidetracked before I could return to this project.

@Joaquin - The DataStax AMI uses a RAID0 configuration on an instance store's ephemeral drives.

@Jonathan - you were correct about the client node being the bottleneck. I setup 3 XL client instances to run contrib/stress back on the 4 node XL Cassandra cluster and incrementally raised number of threads on the clients until I started seeing timeouts.

I set the following mem settings for the client JVMs: -Xms2G -Xmx10G

I raised the default MAX_HEAP setting from the AMI to 12GB (~80% of available memory). I used the default AMI cassandra.yaml settings for the Cassandra nodes until timeouts started appearing, and then raised concurrent_writes to 300 based on a (perhaps arbitrary?) recommendation in 'Cassandra: The Definitive Guide' that recommended raising that number based on number of client threads (timeouts started appearing at 200 threads per client; 600 total threads). The client nodes were in the same AZ as the Cassandra nodes, and I set the --keep-going option on the clients for every other run >= 200 threads.

Results
+----------+----------+----------+----------+----------+----------+----------+
| Server | Client | --keep- | Columns | Client | Total | Combined | | Nodes | Nodes | going | | Threads | Threads | Rate |
+==========+==========+==========+==========+==========+==========+==========+
| 4 | 3 | N | 10000000 | 25 | 75 | 13771 |
+----------+----------+----------+----------+----------+----------+----------+
| 4 | 3 | N | 10000000 | 50 | 150 | 16853 |
+----------+----------+----------+----------+----------+----------+----------+
| 4 | 3 | N | 10000000 | 75 | 225 | 18511 |
+----------+----------+----------+----------+----------+----------+----------+
| 4 | 3 | N | 10000000 | 150 | 450 | 20013 |
+----------+----------+----------+----------+----------+----------+----------+
| 4 | 3 | N | 7574241 | 200 | 600 | 22935 |
+----------+----------+----------+----------+----------+----------+----------+
| 4 | 3 | Y | 10000000 | 200 | 600 | 19737 |
+----------+----------+----------+----------+----------+----------+----------+
| 4 | 3 | N | 9843677 | 250 | 750 | 20869 |
+----------+----------+----------+----------+----------+----------+----------+
| 4 | 3 | Y | 10000000 | 250 | 750 | 21217 |
+----------+----------+----------+----------+----------+----------+----------+
| 4 | 3 | N | 5015711 | 300 | 900 | 24177 |
+----------+----------+----------+----------+----------+----------+----------+
| 4 | 3 | Y | 10000000 | 300 | 900 | 206134 |
+----------+----------+----------+----------+----------+----------+----------+

Other Observations
* `vmstat` showed no swapping during runs
* `iostat -x` always showed 0's for avgqu-sz, await, and %util on the /raid0 (data) partition; 0-150, 0-334ms, and 0-60% respectively for the / (commitlog) partition * %steal from iostat ranged from 8-26% every run (one node had an almost constant 26% while the others averaged closer to 10%) * `nodetool tpstats` never showed more than 10's of Pending ops in RequestResponseStage; no more than 1-2K Pending ops in MutationStage. Usually a single node would register ops; the others would be 0's * After all test runs, Memtable Switch Count was 1385 for Keyspace1.Standard1 * Load average on the Cassandra nodes was very high the entire time, especially for tests where each client ran > 100 threads. Here's one sample @ 200 threads each (600 total):

[i-94e8d2fb] alex@cassandra-qa-1:~$ uptime
17:18:26 up 1 day, 19:04,  2 users,  load average: 20.18, 15.20, 12.87
[i-a0e5dfcf] alex@cassandra-qa-2:~$ uptime
17:18:26 up 1 day, 18:52,  2 users,  load average: 22.65, 25.60, 21.71
[i-92dde7fd] alex@cassandra-qa-3:~$ uptime
17:18:26 up 1 day, 18:44,  2 users,  load average: 24.19, 28.29, 20.17
[i-08caf067] alex@cassandra-qa-4:~$ uptime
17:18:26 up 1 day, 18:37,  2 users,  load average: 31.74, 20.99, 13.97

* Average resource utilization on the client nodes was between 10-80% CPU; 5-25% memory depending on # of threads. Load average was always negligible (presumably because there was no I/O) * After a few runs and truncate operations on Keyspace1.Standard1, the ring became unbalanced before runs:

[i-94e8d2fb] alex@cassandra-qa-1:~$ nodetool -h localhost ring
Address         Status State   Load            Owns    Token
127605887595351923798765477786913079296
10.240.114.143  Up     Normal  2.1 GB          25.00%  0
10.210.154.63 Up Normal 330.19 MB 25.00% 42535295865117307932921825928971026432 10.110.63.247 Up Normal 361.38 MB 25.00% 85070591730234615865843651857942052864 10.46.143.223 Up Normal 1.6 GB 25.00% 127605887595351923798765477786913079296

and after runs:

[i-94e8d2fb] alex@cassandra-qa-1:~$ nodetool -h localhost ring
Address         Status State   Load            Owns    Token
127605887595351923798765477786913079296
10.240.114.143  Up     Normal  3.9 GB          25.00%  0
10.210.154.63 Up Normal 2.05 GB 25.00% 42535295865117307932921825928971026432 10.110.63.247 Up Normal 2.07 GB 25.00% 85070591730234615865843651857942052864 10.46.143.223 Up Normal 3.33 GB 25.00% 127605887595351923798765477786913079296

Based on the above, would I be correct in assuming that frequent memtable flushes and/or commitlog I/O are the likely bottlenecks? Could %steal be partially contributing to the low throughput numbers as well? If a single XL node can do ~12k writes/s, would it be reasonable to expect ~40k writes/s with the above work load and number of nodes?

Thanks for your help, Alex.

On 4/25/11 11:23 AM, Joaquin Casares wrote:
Did the images have EBS storage or Instance Store storage?

Typically EBS volumes aren't the best to be benchmarking against:
http://www.mail-archive.com/user@cassandra.apache.org/msg11022.html

Joaquin Casares
DataStax
Software Engineer/Support



On Wed, Apr 20, 2011 at 5:12 PM, Jonathan Ellis <jbel...@gmail.com <mailto:jbel...@gmail.com>> wrote:

    A few months ago I was seeing 12k writes/s on a single EC2 XL. So
    something is wrong.

    My first suspicion is that your client node may be the bottleneck.

    On Wed, Apr 20, 2011 at 2:56 PM, Alex Araujo
    <cassandra-us...@alex.otherinbox.com
    <mailto:cassandra-us...@alex.otherinbox.com>> wrote:
> Does anyone have any Ec2 benchmarks/experiences they can share? I am trying
    > to get a sense for what to expect from a production cluster on
    Ec2 so that I
    > can compare my application's performance against a sane
    baseline.  What I
    > have done so far is:
    >
    > 1. Lunched a 4 node cluster of m1.xlarge instances in the same
    availability
    > zone using PyStratus
    (https://github.com/digitalreasoning/PyStratus).  Each
    > node has the following specs (according to Amazon):
    > 15 GB memory
    > 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
    > 1,690 GB instance storage
    > 64-bit platform
    >
    > 2. Changed the default PyStratus directories in order to have
    commit logs on
    > the root partition and data files on ephemeral storage:
    > commitlog_directory: /var/cassandra-logs
    > data_file_directories: [/mnt/cassandra-data]
    >
    > 2. Gave each node 10GB of MAX_HEAP; 1GB HEAP_NEWSIZE in
    > conf/cassandra-env.sh
    >
    > 3. Ran `contrib/stress/bin/stress -d node1,..,node4 -n 10000000
    -t 100` on a
    > separate m1.large instance:
    > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
    > ...
    > 9832712,7120,7120,0.004948514851485148,842
    > 9907616,7490,7490,0.0043189949802413755,852
    > 9978357,7074,7074,0.004560353967289125,863
    > 10000000,2164,2164,0.004065933558194335,867
    >
    > 4. Truncated Keyspace1.Standard1:
    > # /usr/local/apache-cassandra/bin/cassandra-cli -host localhost
    -port 9160
    > Connected to: "Test Cluster" on x.x.x.x/9160
    > Welcome to cassandra CLI.
    >
    > Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
    > [default@unknown] use Keyspace1;
    > Authenticated to keyspace: Keyspace1
    > [default@Keyspace1] truncate Standard1;
    > null
    >
    > 5. Expanded the cluster to 8 nodes using PyStratus and sanity
    checked using
    > nodetool:
    > # /usr/local/apache-cassandra/bin/nodetool -h localhost ring
    > Address         Status State   Load            Owns
    > Token
    > x.x.x.x  Up     Normal  1.3 GB          12.50%
    > 21267647932558653966460912964485513216
    > x.x.x.x   Up     Normal  3.06 GB         12.50%
    > 42535295865117307932921825928971026432
    > x.x.x.x     Up     Normal  1.16 GB         12.50%
    > 63802943797675961899382738893456539648
    > x.x.x.x   Up     Normal  2.43 GB         12.50%
    > 85070591730234615865843651857942052864
    > x.x.x.x   Up     Normal  1.22 GB         12.50%
    > 106338239662793269832304564822427566080
    > x.x.x.x    Up     Normal  2.74 GB         12.50%
    > 127605887595351923798765477786913079296
    > x.x.x.x    Up     Normal  1.22 GB         12.50%
    > 148873535527910577765226390751398592512
    > x.x.x.x   Up     Normal  2.57 GB         12.50%
    > 170141183460469231731687303715884105728
    >
    > 6. Ran `contrib/stress/bin/stress -d node1,..,node8 -n 10000000
    -t 100` on a
    > separate m1.large instance again:
    > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
    > ...
    > 9880360,9649,9649,0.003210443956226165,720
    > 9942718,6235,6235,0.003206934154398794,731
    > 9997035,5431,5431,0.0032615939761032457,741
    > 10000000,296,296,0.002660033726812816,742
    >
    > In a nutshell, 4 nodes inserted at 11,534 writes/sec and 8 nodes
    inserted at
    > 13,477 writes/sec.
    >
    > Those numbers seem a little low to me, but I don't have anything
    to compare
    > to.  I'd like to hear others' opinions before I spin my wheels
    with with
    > number of nodes, threads,  memtable, memory, and/or GC
    settings.  Cheers,
    > Alex.
    >



    --
    Jonathan Ellis
    Project Chair, Apache Cassandra
    co-founder of DataStax, the source for professional Cassandra support
    http://www.datastax.com

Reply via email to