Hi Jonathan, Thanks for the answer, I wanted to report on the improvements I got because someone else is bound to run into the same questions...
> > C) I want to access a key that is at the 50th position in that table, > > Cassandra will seek position 0 and then do a sequential read of the file > > from there until it finds the key, right ? > Sequential read of the index file, not the data file. > and then it will seek directly to the right position in the data file ? > > J) I've considered writing a partitioner that will chunk the rows > together > > so that queries for "close" rows go to the same replica on the ring. > Since > > the rows have close keys, they will be close together in the file and > this > > will increase OS cache efficiency. > Sounds like ByteOrderedPartitioner to me. > I indeed ended up using just that > > What do you think ? > I think you should strongly consider denormalizing so that you can > read ranges from a single row instead. > Yes, that's what I did : I took a hard look at the data and the acces pattern and sliced away at everything I could. Given that I am storing data in a quad tree and that I have strong locality in my read-pattern, I ended up using the morton (z-order) code as the key and using super-columns to only get the column groups I'm interested in. I gave some thought on how to balance the tree because I have 10 different levels of data in the quadtree and I am doing tricks with shifts to reuse the same prefixes in the keys. What I think is worth noting for others on the mailing list is that doing this resulted in a x50 to x100 increase in read performance and my IO is now down to virtually nothing (I can basically see the OS load up the pages in its cache). I also found out that one big multiget is more efficient that a couple range queries in my case. So - instead of a steady rate of 280/350MB/s of disk reads I get 100MB/s every so often - instead of seeing my cluster melt down at 3 concurrent clients, it's now speeding along just fine at 50 concurrent clients :)