To find stuff on disk, there is a bloomfilter for each file in memory.  On the 
docs, 1 billion rows has 2Gig of RAM, so it really will have a huge dependency 
on your number of rows.  As you get more rows, you may need to modify the 
bloomfilter false positive to use less RAM but that means slower reads.  Ie. As 
you add more rows, you will have slower reads on a single machine.

We hit the RAM limit on one machine with 1 billion rows so we are in the 
process of tweaking the ratio of 0.000744(the default) to 0.1 to give us more 
time to solve.  Since we see no I/o load on our machines(or rather extremely 
little), we plan on moving to leveled compaction where 0.1 is the default in 
new releases and size tiered new default I think is 0.01.

Ie. If you store more data per row, this is not an issue as much but still 
something to consider.  (Also, rows have a limit I think as well on data size 
but not sure what that is.  I know the column limit on a row is in the 
millions, somewhere lower than 10 million).

Later,
Dean

From: Kanwar Sangha <kan...@mavenir.com<mailto:kan...@mavenir.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, February 25, 2013 8:31 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Read Perf

Hi – I am doing a performance run using modified YCSB client and was able to 
populate 8TB on a node and then ran some read workloads. I am seeing an average 
TPS of 930 ops/sec for random reads. There is no key cache/row cache. Question –

Will the read TPS degrade if the data size increases to say 20 TB , 50 TB, 100 
TB ? If I understand correctly, the read should remain constant irrespective of 
the data size since we eventually have sorted SStables and binary search would 
be done on the index filter to find the row ?


Thanks,
Kanwar

Reply via email to