I fact, in my cassandra-0.6.2, I can only get about 40~50 reads/s with disabled Key/Row cache.
On Sun, Jul 18, 2010 at 1:02 AM, Schubert Zhang <zson...@gmail.com> wrote: > Hi Jonathan, > The 7k reads/s is very high, could you please make more explain about your > benchmark? > > 7000 reads/s makes average latency of each read operation only talks > 0.143ms. Consider 2 disks in the benchmark, it may be 0.286ms. > > But in most random read applications on very large dataset, OS cache and > Cassandra Key/Row cache is not so effective. So, I guess, maybe for a test > on large dataset (such as 1TB) , random reads, the result may not so good. > > > On Sat, Jul 17, 2010 at 9:07 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > >> On Fri, Jul 16, 2010 at 6:06 PM, Oren Benjamin <o...@clearspring.com> >> wrote: >> > The first goal was to reproduce the test described on spyced here: >> http://spyced.blogspot.com/2010/01/cassandra-05.html >> > >> > Using Cassandra 0.6.3, a 4GB/160GB cloud server ( >> http://www.rackspacecloud.com/cloud_hosting_products/servers/pricing) >> with default storage-conf.xml and cassandra.in.sh, here's what I got: >> > >> > Reads: 4,800/s >> > Writes: 9,000/s >> > >> > Pretty close to the result posted on the blog, with a slightly lower >> write performance (perhaps due to the availability of only a single disk for >> both commitlog and data). >> >> You're getting as close as you are because you're comparing 0.6 >> numbers with 0.5. For 0.6 on the test machine used in the blog post >> (quad core, 2 disks, 4GB) we were getting 7k reads and 14k writes. >> >> In our tests we saw a 5-15% performance penalty from adding a >> virtualization layer. Things like only having a single disk are going >> to stack on top of that. >> >> > The above was single node testing. I'd expect to be able to add nodes >> and scale throughput. Unfortunately, I seem to be running into a cap of >> 21,000 reads/s regardless of the number of nodes in the cluster. >> >> This is what I would expect if a single machine is handling all the >> Thrift requests. Are you spreading the client connections to all the >> machines? >> >> > The disk performance of the cloud servers have been extremely spotty... >> Is this normal for the cloud? >> >> Yes. >> >> > And if so, what's the solution re Cassandra? >> >> The larger the instance you're using, the closer you are to having the >> entire machine, meaning less other users are competing with you for >> disk i/o. >> >> Of course when you're renting the entire machine's worth, it can be >> more cost-effective to just use dedicated hardware. >> >> > However, Cassandra routes to the nearest node topologically and not to >> the best performing one, so "bad" nodes will always result in high latency >> reads. >> >> Cassandra routes reads around nodes with temporarily poor performance >> in 0.7, btw. >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com >> > >