Thanks for the info. Very helpful in validating what I've been seeing. As for the scaling limit...
>> The above was single node testing. I'd expect to be able to add nodes and >> scale throughput. Unfortunately, I seem to be running into a cap of 21,000 >> reads/s regardless of the number of nodes in the cluster. > > This is what I would expect if a single machine is handling all the > Thrift requests. Are you spreading the client connections to all the > machines? Yes - in all tests I add all nodes in the cluster to the --nodes list. The client requests are in fact being dispersed among all the nodes as evidenced by the intermittent TimedOutExceptions in the log which show up against the various nodes in the input list. Could it be a result of all the virtual nodes being hosted on the same physical hardware? Am I running into some connection limit? I don't see anything pegged in the JMX stats. On Jul 17, 2010, at 9:07 AM, Jonathan Ellis wrote: > On Fri, Jul 16, 2010 at 6:06 PM, Oren Benjamin <o...@clearspring.com> wrote: >> The first goal was to reproduce the test described on spyced here: >> http://spyced.blogspot.com/2010/01/cassandra-05.html >> >> Using Cassandra 0.6.3, a 4GB/160GB cloud server >> (http://www.rackspacecloud.com/cloud_hosting_products/servers/pricing) with >> default storage-conf.xml and cassandra.in.sh, here's what I got: >> >> Reads: 4,800/s >> Writes: 9,000/s >> >> Pretty close to the result posted on the blog, with a slightly lower write >> performance (perhaps due to the availability of only a single disk for both >> commitlog and data). > > You're getting as close as you are because you're comparing 0.6 > numbers with 0.5. For 0.6 on the test machine used in the blog post > (quad core, 2 disks, 4GB) we were getting 7k reads and 14k writes. > > In our tests we saw a 5-15% performance penalty from adding a > virtualization layer. Things like only having a single disk are going > to stack on top of that. > >> The above was single node testing. I'd expect to be able to add nodes and >> scale throughput. Unfortunately, I seem to be running into a cap of 21,000 >> reads/s regardless of the number of nodes in the cluster. > > This is what I would expect if a single machine is handling all the > Thrift requests. Are you spreading the client connections to all the > machines? > >> The disk performance of the cloud servers have been extremely spotty... Is >> this normal for the cloud? > > Yes. > >> And if so, what's the solution re Cassandra? > > The larger the instance you're using, the closer you are to having the > entire machine, meaning less other users are competing with you for > disk i/o. > > Of course when you're renting the entire machine's worth, it can be > more cost-effective to just use dedicated hardware. > >> However, Cassandra routes to the nearest node topologically and not to the >> best performing one, so "bad" nodes will always result in high latency reads. > > Cassandra routes reads around nodes with temporarily poor performance > in 0.7, btw. > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com