Hi, first of all I am not Cassandra hater :) I do not expect miracles also :) I'm searching if there is any scalable solution which could have be used instead of sharding solution over MySQL or Tokyo Tyrant. Our system now runs OK on single Tokyo Tyrant DB but we expect a lot of traffic increase in couple of months beacuse of new features we plan to implement.
So we have to reimplement our data acces tier in aplication to support sharding for MySQL/TT or change data model to use Cassandra (it is not much complicated). Also what is important our DB admins know _well_ MySQL (years of experience) but mostly with not "huge" installations (so they _might_ be not aware of some problems which can show at large scale). Cassandra is researched because I wanted to check if there is "better" solution than MySQL sharding in terms of: - performance (it should be not as much less than shard of MySQL and scale linearly, we want to have not more that 10K inserts per second of writes, and probably not more than 1K/s reads which will be mostly random) - maintenance easiness (not only in terms of adding new machines) - ability to store big amounts of data (now it looks that we will have about 50GB of uncompressed data per day) On Sat, Sep 18, 2010 at 3:46 AM, Benjamin Black <b...@b3k.us> wrote: > It appears you are doing several things that assure terrible > performance, so I am not surprised you are getting it. > Ok, let's see... explanations go below. > On Tue, Sep 14, 2010 at 3:40 PM, Kamil Gorlo <kgs4...@gmail.com> wrote: >> My main tool was stress.py for benchmarks (or equivalent written in >> C++ to deal with python2.5 lack of multiprocessing). I will focus only >> on reads (random with normal distribution, what is default in >> stress.py) because writes were /quite/ good. >> >> I have 8 machines (xen quests with dedicated pair of 2TB SATA disks >> combined in RAID-O for every guest). Every machine has 4 individual >> cores of 2.4 Ghz and 4GB RAM. >> > > First problem: I/O in Xen is very poor and Cassandra is generally very > sensitive to I/O performance. Heh, I thought that all of DB's are sensitive to I/O performance :) This unfortunately probably could not be changed, we use Xen for now in our company (also for MySQL's). >> Cassandra commitlog and data dirs were on the same disk, > > This is not recommended if you want best performance. You should have > a dedicated commitlog drive. Of course, but I was testing only reads. So commitlog is not affected, am I right? >> I gave 2.5GB >> for Heap for Cassandra, key and row cached were disabled (standard >> Keyspace1 schema, all tests use Standard1 CF). >> All other options were >> defaults. I've disabled cache because I was testing random (or semi >> random - normal distribution) reads so it wouldnt help so much (and >> also because 4GB of RAM is not a lot). >> > > Disabling row cache in this case makes sense, but disabling key cache > is probably hurting your performance quite a bit. If you wrote 20GB > of data per node, with narrow rows as you describe, and had default > memtable settings, you now have a huge number of sstables on disk. > You did not indicate you use nodetool compact to trigger a major > compaction, so I'm assuming you did not. I've disabled key cache because when it was enabled (for 100% of keys) it doesn't help much (improvement was not more than 5reads/s if any). So I've decided to give more RAM to e.g. linux cache. >> For first test I installed Cassandra on only one machine to test it >> and remember results for further comparisons with large cluster and >> other DBs. >> >> 1) RF was set to 1. I've inserted ~20GB of data (this is number >> reported in load column form nodetool ring output) using stress.py >> (100 colums per row). Then I've tested reads and got 200 rows/second >> (reading 100 columns per row, CL=ONE, disks were bottleneck, util was >> 100%). There was no other operation pending during reads (compaction, >> insertion, etc..). >> > > This is normal behavior under random reads for _any_ data base. If > the dataset can't fit in RAM, you are I/O bound. I don't know why you > would expect anything else. You did not indicate your disk access > mode, but if it is mmap and you are not using code that calls > mlockall, then with that size dataset you are almost certainly > swapping, as well. You can check that with vmstat. I am not complaining that Cassandra is using IO. It's perfectly understandable :) I am using Cassandra 0.6.4 with mmap_index_only, and I am not swapping. > Given the combination of very little RAM in comparison to the data > set, very little disk I/O, key caching disabled, a large number of > sstables, and likely mmap I/O without mlockall, you have created about > the worst possible setup. If you are _actually_ dealing with that > much data AND random reads, then you either need enough RAM to hold it > all, or you need SSDs. And that is not specific to Cassandra. > > If you are saying you have similarly misconfigured MySQL and still > gotten better performance, then kudos. You are very lucky. Where is my Cassandra misconfigured? I gave MySQL exactly the same environment in terms of RAM, disk, OS. > > b > Cheers, Kamil