To find stuff on disk, there is a bloomfilter for each file in memory. On the docs, 1 billion rows has 2Gig of RAM, so it really will have a huge dependency on your number of rows. As you get more rows, you may need to modify the bloomfilter false positive to use less RAM but that means slower reads. Ie. As you add more rows, you will have slower reads on a single machine.
We hit the RAM limit on one machine with 1 billion rows so we are in the process of tweaking the ratio of 0.000744(the default) to 0.1 to give us more time to solve. Since we see no I/o load on our machines(or rather extremely little), we plan on moving to leveled compaction where 0.1 is the default in new releases and size tiered new default I think is 0.01. Ie. If you store more data per row, this is not an issue as much but still something to consider. (Also, rows have a limit I think as well on data size but not sure what that is. I know the column limit on a row is in the millions, somewhere lower than 10 million). Later, Dean From: Kanwar Sangha <kan...@mavenir.com<mailto:kan...@mavenir.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Monday, February 25, 2013 8:31 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Read Perf Hi – I am doing a performance run using modified YCSB client and was able to populate 8TB on a node and then ran some read workloads. I am seeing an average TPS of 930 ops/sec for random reads. There is no key cache/row cache. Question – Will the read TPS degrade if the data size increases to say 20 TB , 50 TB, 100 TB ? If I understand correctly, the read should remain constant irrespective of the data size since we eventually have sorted SStables and binary search would be done on the index filter to find the row ? Thanks, Kanwar