Re: Ec2 Stress Results

Alex Araujo Wed, 11 May 2011 17:26:21 -0700

On 5/9/11 9:49 PM, Jonathan Ellis wrote:

On Mon, May 9, 2011 at 5:58 PM, Alex Araujo<cassandra->>  How many
replicas are you writing?

Replication factor is 3.

So you're actually spot on the predicted numbers: you're pushing
20k*3=60k "raw" rows/s across your 4 machines.


You might get another 10% or so from increasing memtable thresholds,
but bottom line is you're right around what we'd expect to see.
Furthermore, CPU is the primary bottleneck which is what you want to
see on a pure write workload.

That makes a lot more sense. I upgraded the cluster to 4 m2.4xlargeinstances (68GB of RAM/8 CPU cores) in preparation for applicationstress tests and the results were impressive @ 200 threads per client:


+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+

| Server Nodes | Client Nodes | --keep-going | Columns |Client | Total | Rep Factor | Test Rate | Cluster Rate || | | | |Threads | Threads | | (writes/s) | (writes/s) |

+==============+==============+==============+==============+==============+==============+==============+==============+==============+

| 4 | 3 | N | 10000000 |200 | 600 | 3 | 44644 | 133931 |

+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+

The issue I'm seeing with app stress tests is that the rate will becomparable/acceptable at first (~100k w/s) and will degrade considerably(~48k w/s) until a flush and restart. CPU usage will correspondingly behigh at first (500-700%) and taper down to 50-200%. My data model ispretty standard (<This> is pseudo-type information):


Users<Column>
"UserId<32CharHash>" : {
    "email<String>": "a...@b.com",
    "first_name<String>": "John",
    "last_name<String>": "Doe"
}

UserGroups<SuperColumn>
"GroupId<UUID>": {
    "UserId<32CharHash>": {
        "date_joined<DateTime>": "2011-05-10 13:14.789",
        "date_left<DateTime>": "2011-05-11 13:14.789",
        "active<short>": "0|1"
    }
}

UserGroupTimeline<Column>
"GroupId<UUID>": {
    "date_joined<TimeUUID>": "UserId<32CharHash>"
}

UserGroupStatus<Column>
"CompositeId('GroupId<UUID>:UserId<32CharHash>')": {
    "active<short>": "0|1"
}

Every new User has a row in Users and a ColumnOrSuperColumn in the other3 CFs (total of 4 operations). One notable difference is that the RAID0on this instance type (surprisingly) only contains two ephemeral volumesand appear a bit more saturated in iostat, although not enough toclearly stand out as the bottleneck. Is the bottleneck in this scenariolikely memtable flush and/or commitlog rotation settings?

RF = 2; ConsistencyLevel = One; -Xmx = 6GB; concurrent_writes: 64; allother settings are the defaults. Thanks, Alex.

Re: Ec2 Stress Results

Reply via email to