> Thanks Peter, but... hmmm, are you saying that even after a cache miss which > results in a disk read and blocks being moved to the ssd, that by the next > cache miss for the same data and subsequent same file blocks, that the ssd > is unlikely to have those same blocks present anymore?
I am saying that regardless of whether the cache is memory, ssd, a combination of both, or anything else, most workloads tend to be subject to diminishing returns. Doubling cache from 5 gb to 10 gb might get you from 10% to 50% cache hit ratio, but doubling again to 20 gb might get you to 60% and doubling to 40 gig to 65% (to use some completely arbitrary random numbers for demonstration purposes). The reason a cache can be more effective than the ratio of its size vs. the total data set, is that there is a hotspot/working set that is smaller than the total data set. If you have completely random access this won't be the case, and an cache of size n% of total size will give you a n% cache hit ratio. But for most workloads, you have a hotter working set so you get more bang for the buck when caching. For example, if 99% of all accesses are accessing 10% of the data, then a cache that is the size of 10% of the data gets you 99% cache hit ratio. But clearly no matter how much more cache you ever add, you will never ever cache more than 100% of reads so in this (artificial arbitrary) scenario, once you're caching 10% of your data the cost of cachine the final small percent of accesses might be 10 times that of the original cache. I did a quick Google but didn't find a good piece describing it more properly, but hopefully the above is helpful. Some related reading might be http://en.wikipedia.org/wiki/Long_Tail -- / Peter Schuller