[ https://issues.apache.org/jira/browse/SOLR-16546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17636241#comment-17636241 ]
Ben Manes commented on SOLR-16546: ---------------------------------- The algorithm achieves a hit rate that is the at or near best across a wide variety of workloads and competitors. That includes those that change over time, as it will adapt to the observed pattern. It takes into account recency and frequency, including the popularity history of recently evicted entries (done by an aged histogram). It should keep your most valued entries and correct itself if it makes too many mispredictions. Of course you are welcome to capture an access trace (log of key hashes) if you want to see an analysis from the simulator. The main limitation of this policy is that it does not account for the latency cost, e.g. to give a bias towards retaining slow queries over more frequent fast ones. There is little research on this topic and those papers use private traces, most often to block competitive research. I have a good idea for an approach that in theory might work very well and be inexpensive, but do not have data to analyze with to justify blindly implementing. Thankfully hit rates are still a good approximate metric for tuning towards user perceived response times, so Caffeine should be pretty solid unless otherwise proven. > Faceting puts an entry for each q into the filterCache > ------------------------------------------------------- > > Key: SOLR-16546 > URL: https://issues.apache.org/jira/browse/SOLR-16546 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: faceting > Affects Versions: 9.0 > Reporter: Andy Lester > Priority: Minor > > I noticed that I was getting far more entries in the filterCache than I was > expecting. All my app's FQs are driven by the app itself. There are only a > couple dozen FQs possible in our queries, but I'd be getting ~10K cache > ejections every hour. That didn't make any sense. > So I investigated and discovered that making a query using facets adds an > entry to the filterCache. Here's my demonstration. > The script show-results is this: > {{curl -s "$URL/twit/admin/cache" | jq -S .queries > curl -s "$URL/admin/metrics" | jq > '.metrics."solr.core.twit"."CACHE.searcher.filterCache".inserts' > }} > The /admin/cache handler is Shawn Heisey's cache dumper he's working on in > ticket SOLR-15859. > {{# Freshly started Solr. No cache entries. > $ ./show-results > {} > 0 > # Query on "alpha" with facets on. > $ curl -s $URL/twit/select?q=title:alpha&rows=0&facet=on&facet.field=grouping > # Now there is a filter cache entry. > $ ./show-results > { > "title:alpha": 0 > } > 1 > # Query on "beta" with facets on. "beta" shows up in the cache. > $ curl -s $URL/twit/select?q=title:beta&rows=0&facet=on&facet.field=grouping > $ ./show-results > { > "title:alpha": 0, > "title:beta": 0 > } > 2 > # Now query on "gamma" with facets OFF. > $ curl -s $URL/twit/select?q=title:gamma&rows=0&facet=off&facet.field=grouping > # The "gamma" does not show up in the filter cache. > $ ./show-results > { > "title:alpha": 0, > "title:beta": 0 > } > 2 > # Now do same query on "gamma" with facets ON. > $ curl -s $URL/twit/select?q=title:gamma&rows=0&facet=on&facet.field=grouping > # The "gamma" shows up. > $ ./show-results > { > "title:alpha": 0, > "title:beta": 0, > "title:gamma": 0 > } > 3 > }} > Is this correct behavior? Do I need to adjust my filterCache to allow for > this? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org