On Fri, Jun 24, 2011 at 3:58 PM, Philippe <watche...@gmail.com> wrote:
> A) Upon opening an SSTTable for read, Cassandra samples one key in 100 to
> speed up disk access.

Close enough.

> Is the percentage configurable ?

# The Index Interval determines how large the sampling of row keys
#  is for a given SSTable. The larger the sampling, the more effective
#  the index is at the cost of space.
index_interval: 128

> What is the relationship between this sampling and the key cache ?

None.  The key cache remembers LRU key locations.

> C) I want to access a key that is at the 50th position in that table,
> Cassandra will seek position 0 and then do a sequential read of the file
> from there until it finds the key, right ?

Sequential read of the index file, not the data file.

> D) Does the data for a key immediatly follow the row in the file ?

Yes.

> H) Going back to my previous example : if my keycache has 100 keys capacity,
> then I'll only have to scan the file for 1/2 the requests

Right.

> I never read a single row but ranges of
> rows with column slices. The sizes are varying.

While key cache and row cache *can* speed up range slicing, usually
you don't have enough cache capacity for this to be useful in
practice.

> J) I've considered writing a partitioner that will chunk the rows together
> so that queries for "close" rows go to the same replica on the ring. Since
> the rows have close keys, they will be close together in the file and this
> will increase OS cache efficiency.

Sounds like ByteOrderedPartitioner to me.

> What do you think ?

I think you should strongly consider denormalizing so that you can
read ranges from a single row instead.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Reply via email to