> Thanks Peter, but... hmmm, are you saying that even after a cache miss which
> results in a disk read and blocks being moved to the ssd, that by the next
> cache miss for the same data and subsequent same file blocks, that the ssd
> is unlikely to have those same blocks present anymore?

I am saying that regardless of whether the cache is memory, ssd, a
combination of both, or anything else, most workloads tend to be
subject to diminishing returns. Doubling cache from 5 gb to 10 gb
might get you from 10% to 50% cache hit ratio, but doubling again to
20 gb might get you to 60% and doubling to 40 gig to 65% (to use some
completely arbitrary random numbers for demonstration purposes).

The reason a cache can be more effective than the ratio of its size
vs. the total data set, is that there is a hotspot/working set that is
smaller than the total data set. If you have completely random access
this won't be the case, and an cache of size n% of total size will
give you a n% cache hit ratio.

But for most workloads, you have a hotter working set so you get more
bang for the buck when caching. For example, if 99% of all accesses
are accessing 10% of the data, then a cache that is the size of 10% of
the data gets you 99% cache hit ratio. But clearly no matter how much
more cache you ever add, you will never ever cache more than 100% of
reads so in this (artificial arbitrary) scenario, once you're caching
10% of your data the cost of cachine the final small percent of
accesses might be 10 times that of the original cache.

I did a quick Google but didn't find a good piece describing it more
properly, but hopefully the above is helpful. Some related reading
might be http://en.wikipedia.org/wiki/Long_Tail

-- 
/ Peter Schuller

Reply via email to