Thanks Radim.
Radim, actually 100 reads per second is achievable even with 2 disks.
But achieving them with a really low avg latency per key is the issue.

I am wondering if anyone has played with index_interval, and how much of a
difference would it make to reads on reducing the index_interval. I am
thinking of devoting a 32 gig RAM machine to this node, and decreasing
index_interval to a value of 8 from 128.

For 500 million keys, this would mean 500/8 ~ 64 million keys in memory.

index overhead  = 64 million * (32 + avg key size) (
http://www.datastax.com/docs/1.0/cluster_architecture/cluster_planning)
my avg keysize=8. hence
overhead = 64 million * 40 = 2.56 gb (is this number same as the size in
memory?).
If yes,  then its not too bad, and eliminates the index disk read for a
large majority of the keys.

Also, my data has uniformly 2 columns. Will sstable compression help my
reads in any way?
Thanks
Gurpreet





On Fri, May 18, 2012 at 6:19 AM, Radim Kolar <h...@filez.com> wrote:

> to get 100 random reads per second on large dataset (100 GB) you need more
> disks in raid 0 then 2.
> Better is to add more nodes then stick too much disks into node. You need
> also adjust io scheduler in OS.
>

Reply via email to