[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864543#comment-13864543
 ] 

Jonathan Ellis commented on CASSANDRA-5357:
-------------------------------------------

bq. I think we could also do some intelligent sizing of the cache per-CF with 
the metrics we keep, that would be relatively static (so impervious to churn).

I'm not sure what I was thinking here.  (Maybe that we'd only need one cached 
partition per CF which is nonsense.)  We do need LRU or similar behavior at a 
high level, just like we do with the row cache today.

The question is, how much of each partition do we cache?  I think it's a lot 
simpler if we decide we'll cache the same amount for each partition in a CF, 
and not try to be clever and "extend" a cached partition when we query for more 
later.

So how much do we cache?  We can either

# Make the user configure it, which requires creating new CQL syntax, or
# Determine it automatically

Personally I'd lean towards (2):
# Track an EstimatedHistogram of LIMITs in qualifying queries
# Set the cells-to-cache per CF so that we maximize the queries we can satisfy 
for a given cache size
# I think this also means we should go back to a separate cache per CF with its 
own size limit -- if we have 1000 queries/s against CF X's cache, then we 
shouldn't throw those away when a query against CF Y comes in where we expect 
only 10/s

In the interest of shipping sooner than later though I'll take whatever we can 
reasonably do for 2.1.0 and push the rest out to improve later.  If we just 
have a single "cache this many cells" parameter in cassandra.yaml that's still 
better than people OOMing themselves with the classic row cache.

> Query cache
> -----------
>
>                 Key: CASSANDRA-5357
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Marcus Eriksson
>             Fix For: 2.1
>
>
> I think that most people expect the row cache to act like a query cache, 
> because that's a reasonable model.  Caching the entire partition is, in 
> retrospect, not really reasonable, so it's not surprising that it catches 
> people off guard, especially given the confusion we've inflicted on ourselves 
> as to what a "row" constitutes.
> I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to