On Wed, Jul 7, 2010 at 2:27 AM, Michael Dürgner <m...@duergner.de> wrote:

> Have you done some testing with small nodes already? Because from what we
> saw trying to run IO bound services on small instances is, that their IO
> performance is really bad compared to other instance types as you can read
> in several blogs.
>
> Would be interesting to hear, if a Cassandra cluster can handle that.
>

I have actually.

I tested on 10 small nodes on Amazon EC2, each with 1 EBS disk. I've been
avoiding large nodes for now since they are 4x the cost of a small, and 10
small would translate to 2.5 large nodes. We figured it's better to slice
things into more nodes, since 2 or 3 nodes would mean large chunks of data
would need to be moved if a node failed.

Under pure write loads with a fairly default config and 3x replication, we
achieved 1,000 writes per second and probably could have pushed it a little
bit more (perhaps to 2k per second). Write speed barely slowed even as we
pushed past 50 million keys. Keys were 255 bytes with a single column
containing 768 bytes.

Things got much worse when we introduced reads, however. We did a 50/50 read
write split. IO went up, and nodes failed a couple hours into the test with
out of memory errors. My theory is that the reads caused much more IO, which
caused writes to get backed up in memory.

I've had success in the past with RAID striping on EBS volumes. I was able
to get nearly 4x improvement on a small instance with MySQL, so my next
thing would be to try RAID with Cassandra.

Also, another theory is that CommitLogSync in batch mode might allow me to
effectively rate limit writing so that I don't overflow memory.

Thoughts?

- Andrew

Reply via email to