Thanks Radim. Radim, actually 100 reads per second is achievable even with 2 disks. But achieving them with a really low avg latency per key is the issue.
I am wondering if anyone has played with index_interval, and how much of a difference would it make to reads on reducing the index_interval. I am thinking of devoting a 32 gig RAM machine to this node, and decreasing index_interval to a value of 8 from 128. For 500 million keys, this would mean 500/8 ~ 64 million keys in memory. index overhead = 64 million * (32 + avg key size) ( http://www.datastax.com/docs/1.0/cluster_architecture/cluster_planning) my avg keysize=8. hence overhead = 64 million * 40 = 2.56 gb (is this number same as the size in memory?). If yes, then its not too bad, and eliminates the index disk read for a large majority of the keys. Also, my data has uniformly 2 columns. Will sstable compression help my reads in any way? Thanks Gurpreet On Fri, May 18, 2012 at 6:19 AM, Radim Kolar <h...@filez.com> wrote: > to get 100 random reads per second on large dataset (100 GB) you need more > disks in raid 0 then 2. > Better is to add more nodes then stick too much disks into node. You need > also adjust io scheduler in OS. >