In that case, make sure you don't plan on going into the millions or test
the limit as I pretty sure it can't go above 10 million. (from previous
posts on this list).

Dean

On 2/26/13 8:23 AM, "Kanwar Sangha" <kan...@mavenir.com> wrote:

>Thanks. For our case, the no of rows will more or less be the same. The
>only thing which changes is the columns and they keep getting added.
>
>-----Original Message-----
>From: Hiller, Dean [mailto:dean.hil...@nrel.gov]
>Sent: 26 February 2013 09:21
>To: user@cassandra.apache.org
>Subject: Re: Read Perf
>
>To find stuff on disk, there is a bloomfilter for each file in memory.
>On the docs, 1 billion rows has 2Gig of RAM, so it really will have a
>huge dependency on your number of rows.  As you get more rows, you may
>need to modify the bloomfilter false positive to use less RAM but that
>means slower reads.  Ie. As you add more rows, you will have slower reads
>on a single machine.
>
>We hit the RAM limit on one machine with 1 billion rows so we are in the
>process of tweaking the ratio of 0.000744(the default) to 0.1 to give us
>more time to solve.  Since we see no I/o load on our machines(or rather
>extremely little), we plan on moving to leveled compaction where 0.1 is
>the default in new releases and size tiered new default I think is 0.01.
>
>Ie. If you store more data per row, this is not an issue as much but
>still something to consider.  (Also, rows have a limit I think as well on
>data size but not sure what that is.  I know the column limit on a row is
>in the millions, somewhere lower than 10 million).
>
>Later,
>Dean
>
>From: Kanwar Sangha <kan...@mavenir.com<mailto:kan...@mavenir.com>>
>Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>Date: Monday, February 25, 2013 8:31 PM
>To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>Subject: Read Perf
>
>Hi - I am doing a performance run using modified YCSB client and was able
>to populate 8TB on a node and then ran some read workloads. I am seeing
>an average TPS of 930 ops/sec for random reads. There is no key cache/row
>cache. Question -
>
>Will the read TPS degrade if the data size increases to say 20 TB , 50
>TB, 100 TB ? If I understand correctly, the read should remain constant
>irrespective of the data size since we eventually have sorted SStables
>and binary search would be done on the index filter to find the row ?
>
>
>Thanks,
>Kanwar

Reply via email to