Hi Jonathan,
Thanks for the answer, I wanted to report on the improvements I got because
someone else is bound to run into the same questions...


> > C) I want to access a key that is at the 50th position in that table,
> > Cassandra will seek position 0 and then do a sequential read of the file
> > from there until it finds the key, right ?
> Sequential read of the index file, not the data file.
>
and then it will seek directly to the right position in the data file ?


> > J) I've considered writing a partitioner that will chunk the rows
> together
> > so that queries for "close" rows go to the same replica on the ring.
> Since
> > the rows have close keys, they will be close together in the file and
> this
> > will increase OS cache efficiency.
> Sounds like ByteOrderedPartitioner to me.
>
I indeed ended up using just that


> > What do you think ?
>  I think you should strongly consider denormalizing so that you can
> read ranges from a single row instead.
>
Yes, that's what I did : I took a hard look at the data and the acces
pattern and sliced away at everything I could.

Given that I am storing data in a quad tree and that I have strong locality
in my read-pattern, I ended up using the morton (z-order) code as the key
and using super-columns to only get the column groups I'm interested in.
I gave some thought on how to balance the tree because I have 10 different
levels of data in the quadtree and I am doing tricks with shifts to reuse
the same prefixes in the keys.

What I think is worth noting for others on the mailing list is that doing
this resulted in a x50 to x100 increase in read performance and my IO is now
down to virtually nothing (I can basically see the OS load up the pages in
its cache).
I also found out that one big multiget is more efficient that a couple range
queries in my case.

So
 - instead of a steady rate of 280/350MB/s of disk reads I get 100MB/s every
so often
 - instead of seeing my cluster melt down at 3 concurrent clients, it's now
speeding along just fine at 50 concurrent clients

:)

Reply via email to