Re: Ec2 Stress Results

Alex Araujo Mon, 09 May 2011 15:58:47 -0700

On 5/6/11 9:47 PM, Jonathan Ellis wrote:

On Fri, May 6, 2011 at 5:13 PM, Alex Araujo
<cassandra-us...@alex.otherinbox.com>  wrote:

I raised the default MAX_HEAP setting from the AMI to 12GB (~80% of
available memory).

This is going to make GC pauses larger for no good reason.

Good point - only doing writes at the moment. I will revert the changeand raise this conservatively once I add reads to the mix.

raised
concurrent_writes to 300 based on a (perhaps arbitrary?) recommendation in
'Cassandra: The Definitive Guide'

That's never been a good recommendation.

It seemed to contradict the '8 * number of cores' rule of thumb. I setthat back to the default of 32.

Based on the above, would I be correct in assuming that frequent memtable
flushes and/or commitlog I/O are the likely bottlenecks?

Did I miss where you said what CPU usage was?

I observed a consistent 200-350% initially; 300-380% once 'hot' for allruns. Here is an average case sample:


 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
15108 cassandr  20   0 5406m 4.5g  15m S  331 30.4  89:32.50 jsvc

How many replicas are you writing?


Replication factor is 3.

Recent testing suggests that putting the commitlog on the raid0 volume
is better than on the root volume on ec2, since the root isn't really
a separate device.

I migrated the commitlog to the raid0 volume and retested with the abovechanges. I/O appeared more consistent in iostat. Here's an averagecase (%util in the teens):


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          36.84    4.05   13.97    3.04   18.42   23.68

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/savgrq-sz avgqu-sz await svctm %utilxvdap1 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 0.00 0.00xvdb 0.00 0.00 0.00 222.00 0.00 18944.0085.33 13.80 62.16 0.59 13.00xvdc 0.00 0.00 0.00 231.00 0.00 19480.0084.33 5.80 25.11 0.78 18.00xvdd 0.00 0.00 0.00 228.00 0.00 19456.0085.33 17.43 76.45 0.57 13.00xvde 0.00 0.00 0.00 229.00 0.00 19464.0085.00 10.41 45.46 0.44 10.00md0 0.00 0.00 0.00 910.00 0.00 77344.0084.99 0.00 0.00 0.00 0.00


and worst case (%util above 60):

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          44.33    0.00   24.54    0.82   15.46   14.85

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/savgrq-sz avgqu-sz await svctm %utilxvdap1 0.00 1.00 0.00 4.00 0.00 40.0010.00 0.15 37.50 22.50 9.00xvdb 0.00 0.00 0.00 427.00 0.00 36440.0085.34 54.12 147.85 1.69 72.00xvdc 0.00 0.00 1.00 295.00 8.00 25072.0084.73 34.56 84.32 2.13 63.00xvdd 0.00 0.00 0.00 355.00 0.00 30296.0085.34 94.49 257.61 2.17 77.00xvde 0.00 0.00 0.00 373.00 0.00 31768.0085.17 68.50 189.33 1.88 70.00md0 0.00 0.00 1.00 1418.00 8.00 120824.0085.15 0.00 0.00 0.00 0.00

Overall, results were roughly the same. The most noticeable differencewas no timeouts until number of client threads was 350 (previously 200):


+----------+----------+----------+----------+----------+----------+----------+

| Server | Client | --keep- | Columns | Client | Total |Combined || Nodes | Nodes | going | | Threads | Threads | Rate(wr || | | | | | |ites/s) |

+==========+==========+==========+==========+==========+==========+==========+

| 4 | 3 | N | 10000000 | 150 | 450 |21241 |

+----------+----------+----------+----------+----------+----------+----------+

| 4 | 3 | N | 10000000 | 200 | 600 |21536 |

+----------+----------+----------+----------+----------+----------+----------+

| 4 | 3 | N | 10000000 | 250 | 750 |19451 |

+----------+----------+----------+----------+----------+----------+----------+

| 4 | 3 | N | 10000000 | 300 | 900 |19741 |

+----------+----------+----------+----------+----------+----------+----------+

Those results are after I compiled/deployed the latest cassandra-0.7with the patch forhttps://issues.apache.org/jira/browse/CASSANDRA-2578. Thoughts?

Re: Ec2 Stress Results

Reply via email to