After a bit of searching, I think I've found the answer I've been looking for. 
I guess I didn't search hard enough before sending out this email. Thank you 
all for the responses.

According to the datastax documentation [1], there are two types of row cache 
providers:

row_cache_provider
(Default: SerializingCacheProvider) Specifies what kind of implementation to 
use for the row cache.
SerializingCacheProvider: Serializes the contents of the row and stores it in 
native memory, that is, off the JVM Heap. Serialized rows take significantly 
less memory than live rows in the JVM, so you can cache more rows in a given 
memory footprint. Storing the cache off-heap means you can use smaller heap 
sizes, which reduces the impact of garbage collection pauses. It is valid to 
specify the fully-qualified class name to a class that 
implementsorg.apache.cassandra.cache.IRowCacheProvider.
ConcurrentLinkedHashCacheProvider: Rows are cached using the JVM heap, 
providing the same row cache behavior as Cassandra versions prior to 0.8.

The SerializingCacheProvider is 5 to 10 times more memory-efficient than 
ConcurrentLinkedHashCacheProvider for applications that are not blob-intensive. 
However, SerializingCacheProvider may perform worse in update-heavy workload 
situations because it invalidates cached rows on update instead of updating 
them in place as ConcurrentLinkedHashCacheProvider does.


The off-heap row cache provider does indeed invalidate rows. We're going to 
look into using the ConcurrentLinkedHashCacheProvider. Time to read some source 
code! :)

Faraaz

[1] 
http://www.datastax.com/documentation/cassandra/1.2/webhelp/cassandra/configuration/configCassandra_yaml_r.html#reference_ds_qfg_n1r_1k__row_cache_provider




On Thursday, August 22, 2013 at 7:40 PM, Boris Yen wrote:

> If you are using off-heap memory for row cache, "all writes invalidate the 
> entire row" should be correct.
> 
> Boris
> 
> 
> On Fri, Aug 23, 2013 at 8:32 AM, Robert Coli <rc...@eventbrite.com 
> (mailto:rc...@eventbrite.com)> wrote:
> > On Wed, Aug 14, 2013 at 10:56 PM, Faraaz Sareshwala 
> > <fsareshw...@quantcast.com (mailto:fsareshw...@quantcast.com)> wrote:
> > > All writes invalidate the entire row (updates thrown out the cached row)
> > This is not correct. Writes are added to the row, if it is in the row 
> > cache. If it's not in the row cache, the row is not added to the cache. 
> >  
> > Citation from jbellis on stackoverflow, because I don't have time to find a 
> > better one and the code is not obvious about it :
> > 
> > http://stackoverflow.com/a/12499422 
> > 
> > > I have yet to go through the source code for the row cache. I do plan to 
> > > do that. Can someone point me to documentation on the row cache 
> > > internals? All I've found online so far is small discussion about it and 
> > > how to enable it. 
> > 
> > There is no such documentation, or at least if it exists I am unaware of it.
> > 
> > In general, the rule of thumb is that the Row Cache should not be used 
> > unless the rows in question are : 
> > 
> > 1) Very hot in terms of access
> > 2) Uniform in size
> > 3) "Small"
> > 
> > =Rob  

Reply via email to