Thanks for the update, that is very useful! On Tue, Jul 12, 2011 at 3:16 PM, Philippe <watche...@gmail.com> wrote: > Hi Jonathan, > Thanks for the answer, I wanted to report on the improvements I got because > someone else is bound to run into the same questions... > >> >> > C) I want to access a key that is at the 50th position in that table, >> > Cassandra will seek position 0 and then do a sequential read of the file >> > from there until it finds the key, right ? >> Sequential read of the index file, not the data file. > > and then it will seek directly to the right position in the data file ? > >> >> > J) I've considered writing a partitioner that will chunk the rows >> > together >> > so that queries for "close" rows go to the same replica on the ring. >> > Since >> > the rows have close keys, they will be close together in the file and >> > this >> > will increase OS cache efficiency. >> Sounds like ByteOrderedPartitioner to me. > > I indeed ended up using just that > >> >> > What do you think ? >> I think you should strongly consider denormalizing so that you can >> read ranges from a single row instead. > > Yes, that's what I did : I took a hard look at the data and the acces > pattern and sliced away at everything I could. > Given that I am storing data in a quad tree and that I have strong locality > in my read-pattern, I ended up using the morton (z-order) code as the key > and using super-columns to only get the column groups I'm interested in. > I gave some thought on how to balance the tree because I have 10 different > levels of data in the quadtree and I am doing tricks with shifts to reuse > the same prefixes in the keys. > What I think is worth noting for others on the mailing list is that doing > this resulted in a x50 to x100 increase in read performance and my IO is now > down to virtually nothing (I can basically see the OS load up the pages in > its cache). > I also found out that one big multiget is more efficient that a couple range > queries in my case. > So > - instead of a steady rate of 280/350MB/s of disk reads I get 100MB/s every > so often > - instead of seeing my cluster melt down at 3 concurrent clients, it's now > speeding along just fine at 50 concurrent clients > :)
-- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com