i only anticipate about 2,000,000 hot rows, each with about 4k of data. however, we will have a LOT of rows that just aren't used. right now, the data is just one column with a blob of text in it. but i have new data coming in constantly, so not sure how this affects the cache, etc. i'm skeptical about using any cache really, and just rely on the OS (as you mentioned.) i've been trying this out to see if there's a performance gain somewhere, but i'm not seeing it.

Nathan McCall wrote:
The cache is a "second-chance FIFO" from this library:
http://code.google.com/p/concurrentlinkedhashmap/source/browse/trunk/src/java/com/reardencommerce/kernel/collections/shared/evictable/ConcurrentLinkedHashMap.java

That sounds like an awful lot of churn given the size of the queue and
the number of references it might keep for the second-chance stuff.
How big of a hot data set do you need to maintain? The amount of
overhead for such a large record set may not buy you anything over
just relying on the file system cache and turning down the heap size.

-Nate

On Tue, Mar 16, 2010 at 1:17 PM, B. Todd Burruss <bburr...@real.com> wrote:
i think i better make sure i understand how the row/key cache works.  i
currently have both set to 10%.  so if cassandra needs to read data from an
sstable that has 100 million rows, it will cache 10,000,000 rows of data
from that sstable?  so if my row is ~4k, then we're looking at ~40gb used by
cache?

Reply via email to