Wouldn't it be the case that the once-used rows in your batch process would
quickly be traded out of the cache, and replaced by frequently-used rows?
This would be the case even if your batch process goes on for a long time,
since caching is done on a row-by-row basis. In effect, it would mean that
part of your cache is taken up by the batch process, much as if you
dedicated a permanent cache to the batch - except that it isn't permanent,
so it's better!


On Mon, May 2, 2011 at 7:50 AM, Tyler Hobbs <ty...@datastax.com> wrote:

> If you had one big cache, wouldn't it be the case that it's mostly
>> populated with frequently accessed rows, and less populated with rarely
>> accessed rows?
>>
>
> Yes.
>
> In fact, wouldn't one big cache dynamically and automatically give you
>> exactly what you want? If you try to partition the same amount of memory
>> manually, by guesswork, among many tables, aren't you always going to do a
>> worse job?
>>
>
> Suppose you have one CF that's used constantly through interaction by
> users.  Suppose you have another CF that's only used periodically by a batch
> process, you tend to access most or all of the rows during the batch
> process, and it's too large to cache all of the rows.  Normally, you would
> dedicate cache space to the first CF as anything with human interaction
> tends to have good temporal locality and you want to keep latencies there
> low.  On the other hand, caching the second CF provides little to no real
> benefit.  When you combine these two CFs, every time your batch process
> runs, rows from the second CF will populate the cache and will cause
> eviction of rows from the first CF, even though having those rows in the
> cache provides little benefit to you.
>
> As another example, if you mix a CF with wide rows and a CF with small
> rows, you no longer have the option of using a row cache, even if it makes
> great sense for the small-row CF data.
>
> Knowledge of data and access patterns gives you a very good advantage when
> it comes to caching your data effectively.
>
>
> --
> Tyler Hobbs
> Software Engineer, DataStax <http://datastax.com/>
> Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra
> Python client library
>
>

Reply via email to