Wouldn't it be the case that the once-used rows in your batch process would quickly be traded out of the cache, and replaced by frequently-used rows? This would be the case even if your batch process goes on for a long time, since caching is done on a row-by-row basis. In effect, it would mean that part of your cache is taken up by the batch process, much as if you dedicated a permanent cache to the batch - except that it isn't permanent, so it's better!
On Mon, May 2, 2011 at 7:50 AM, Tyler Hobbs <ty...@datastax.com> wrote: > If you had one big cache, wouldn't it be the case that it's mostly >> populated with frequently accessed rows, and less populated with rarely >> accessed rows? >> > > Yes. > > In fact, wouldn't one big cache dynamically and automatically give you >> exactly what you want? If you try to partition the same amount of memory >> manually, by guesswork, among many tables, aren't you always going to do a >> worse job? >> > > Suppose you have one CF that's used constantly through interaction by > users. Suppose you have another CF that's only used periodically by a batch > process, you tend to access most or all of the rows during the batch > process, and it's too large to cache all of the rows. Normally, you would > dedicate cache space to the first CF as anything with human interaction > tends to have good temporal locality and you want to keep latencies there > low. On the other hand, caching the second CF provides little to no real > benefit. When you combine these two CFs, every time your batch process > runs, rows from the second CF will populate the cache and will cause > eviction of rows from the first CF, even though having those rows in the > cache provides little benefit to you. > > As another example, if you mix a CF with wide rows and a CF with small > rows, you no longer have the option of using a row cache, even if it makes > great sense for the small-row CF data. > > Knowledge of data and access patterns gives you a very good advantage when > it comes to caching your data effectively. > > > -- > Tyler Hobbs > Software Engineer, DataStax <http://datastax.com/> > Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra > Python client library > >