I am also interested in seeing how the performance of Cassandra performs on various virtual platforms.
On Wed, Jul 7, 2010 at 2:15 PM, Andrew Rollins <and...@localytics.com>wrote: > On Wed, Jul 7, 2010 at 2:27 AM, Michael Dürgner <m...@duergner.de> wrote: > >> Have you done some testing with small nodes already? Because from what we >> saw trying to run IO bound services on small instances is, that their IO >> performance is really bad compared to other instance types as you can read >> in several blogs. >> >> Would be interesting to hear, if a Cassandra cluster can handle that. >> > > I have actually. > > I tested on 10 small nodes on Amazon EC2, each with 1 EBS disk. I've been > avoiding large nodes for now since they are 4x the cost of a small, and 10 > small would translate to 2.5 large nodes. We figured it's better to slice > things into more nodes, since 2 or 3 nodes would mean large chunks of data > would need to be moved if a node failed. > > Under pure write loads with a fairly default config and 3x replication, we > achieved 1,000 writes per second and probably could have pushed it a little > bit more (perhaps to 2k per second). Write speed barely slowed even as we > pushed past 50 million keys. Keys were 255 bytes with a single column > containing 768 bytes. > > Things got much worse when we introduced reads, however. We did a 50/50 > read write split. IO went up, and nodes failed a couple hours into the test > with out of memory errors. My theory is that the reads caused much more IO, > which caused writes to get backed up in memory. > > I've had success in the past with RAID striping on EBS volumes. I was able > to get nearly 4x improvement on a small instance with MySQL, so my next > thing would be to try RAID with Cassandra. > > Also, another theory is that CommitLogSync in batch mode might allow me to > effectively rate limit writing so that I don't overflow memory. > > Thoughts? > > - Andrew > -- -Richard L. Burton III