The key is that while Cassandra may read less rows per second than MySQL when you are i/o bound (as you are here) because of SSTable merging (see http://wiki.apache.org/cassandra/MemtableSSTable), you should be using your Cassandra rows as materialized views so that each query is a single row lookup rather than many.
On Tue, Sep 14, 2010 at 5:40 PM, Kamil Gorlo <kgs4...@gmail.com> wrote: > Hey, > > we are considering using Cassandra for quite large project and because > of that I made some tests with Cassandra. I was testing performance > and stability mainly. > > My main tool was stress.py for benchmarks (or equivalent written in > C++ to deal with python2.5 lack of multiprocessing). I will focus only > on reads (random with normal distribution, what is default in > stress.py) because writes were /quite/ good. > > I have 8 machines (xen quests with dedicated pair of 2TB SATA disks > combined in RAID-O for every guest). Every machine has 4 individual > cores of 2.4 Ghz and 4GB RAM. > > Cassandra commitlog and data dirs were on the same disk, I gave 2.5GB > for Heap for Cassandra, key and row cached were disabled (standard > Keyspace1 schema, all tests use Standard1 CF). All other options were > defaults. I've disabled cache because I was testing random (or semi > random - normal distribution) reads so it wouldnt help so much (and > also because 4GB of RAM is not a lot). > > For first test I installed Cassandra on only one machine to test it > and remember results for further comparisons with large cluster and > other DBs. > > 1) RF was set to 1. I've inserted ~20GB of data (this is number > reported in load column form nodetool ring output) using stress.py > (100 colums per row). Then I've tested reads and got 200 rows/second > (reading 100 columns per row, CL=ONE, disks were bottleneck, util was > 100%). There was no other operation pending during reads (compaction, > insertion, etc..). > > 2) So I moved to bigger cluster, with 8 machines and RF set to 2. I've > inserted about ~20GB data per node (so 20 GB * 8 / 2 = 80GB of "real > data"). Then I've tested reads, exactly te same way as before, and got > about 450 rows/second (reading 100 columns (but reading only 1 in fact > makes no difference), CL=ONE, disks on every machine was 100% util > because of random reads). > > 3) Then I changed RF from 2 to 3 on cluster described in 2). So I > ended with every node loaded with about 30GB of data. Then as usual, > I've tested reads, and got only 300 rows/second from whole cluster > (100% util on every disk). > > 4) Last test was with RF=3 as before, but I've inserted even more > data, so every node on 8-machines cluster had ~100GB of data (8 * > 100GB / 3 = 266GB of real data). In this case I've got only 125 > rows/second. > > I was using multiple processes and machines to test reads. > > > *So my question is why these numbers are so low? What is especially > suprising for me is that changing RF from 2 to 3 drops performance > from 450 to 300 reads per second. Is this because of read repair?* > > > PS. To compare Cassandra performance with other DBs, I've also tested > MySQL with almost exact data (one table with two columns, key (int PK) > and value(VARCHAR(500)) simulating 100 columns in Cassandra for > single row). MySQL was installed on the same machine as Cassandra from > test 1) (which is one of these 8 machines described before). I've > inserted some data and then tested random reads (which was even worse > for caching because I've used standard rand() from C++ to generate > keys, not normal distribution). Here are results: > > size of data in db -> reads per second > 21 GB -> 340 > 400 GB -> 200 > > So I've got more reads from single MySQL with 400GB of data than from > 8 machines storing about 266GB. This doesn't look good. What am I > doing wrong? :) > > Cheers, > Kamil > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com