On Fri, Jun 24, 2011 at 3:58 PM, Philippe <watche...@gmail.com> wrote: > A) Upon opening an SSTTable for read, Cassandra samples one key in 100 to > speed up disk access.
Close enough. > Is the percentage configurable ? # The Index Interval determines how large the sampling of row keys # is for a given SSTable. The larger the sampling, the more effective # the index is at the cost of space. index_interval: 128 > What is the relationship between this sampling and the key cache ? None. The key cache remembers LRU key locations. > C) I want to access a key that is at the 50th position in that table, > Cassandra will seek position 0 and then do a sequential read of the file > from there until it finds the key, right ? Sequential read of the index file, not the data file. > D) Does the data for a key immediatly follow the row in the file ? Yes. > H) Going back to my previous example : if my keycache has 100 keys capacity, > then I'll only have to scan the file for 1/2 the requests Right. > I never read a single row but ranges of > rows with column slices. The sizes are varying. While key cache and row cache *can* speed up range slicing, usually you don't have enough cache capacity for this to be useful in practice. > J) I've considered writing a partitioner that will chunk the rows together > so that queries for "close" rows go to the same replica on the ring. Since > the rows have close keys, they will be close together in the file and this > will increase OS cache efficiency. Sounds like ByteOrderedPartitioner to me. > What do you think ? I think you should strongly consider denormalizing so that you can read ranges from a single row instead. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com