[ 
https://issues.apache.org/jira/browse/SOLR-15859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633402#comment-17633402
 ] 

Ben Manes edited comment on SOLR-15859 at 11/13/22 9:33 PM:
------------------------------------------------------------

bq. I figured that my synchronization additions wouldn't drastically alter 
performance because Caffeine is probably already doing something similar 
itself... I felt what I was adding to synchronization should be pretty fast and 
not cause major issues.

Caffeine does lock free reads on a cache hit, so reads easily scale to 100s of 
millions per second whereas writes throttle to ~50M/s. A global lock would 
throttle both to ~10M ops/s 
([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#read-100-1]).
 The trick is that Caffeine appends into lossy ring buffers on a read to batch 
those events and uses a {{tryLock}} to replay then against the policy in a 
non-blocking fashion. This way it sidesteps the LRU problem of ever read is a 
write to global state by instead writing to a cheaper array, striped by thread 
id, in a best-effort fashion so that the metadata write is almost free and does 
not block progress. 


was (Author: ben.manes):
> I figured that my synchronization additions wouldn't drastically alter 
> performance because Caffeine is probably already doing something similar 
> itself... I felt what I was adding to synchronization should be pretty fast 
> and not cause major issues.

Caffeine does lock free reads on a cache hit, so reads easily scale to 100s of 
millions per second whereas writes throttle to ~50M/s. A global lock would 
throttle both to ~10M ops/s 
([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#read-100-1]).
 The trick is that Caffeine appends into lossy ring buffers on a read to batch 
those events and uses a {{tryLock}} to replay then against the policy in a 
non-blocking fashion. This way it sidesteps the LRU problem of ever read is a 
write to global state by instead writing to a cheaper array, striped by thread 
id, in a best-effort fashion so that the metadata write is almost free and does 
not block progress. 

> Add handler to dump filter cache
> --------------------------------
>
>                 Key: SOLR-15859
>                 URL: https://issues.apache.org/jira/browse/SOLR-15859
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Andy Lester
>            Assignee: Shawn Heisey
>            Priority: Major
>              Labels: FQ, cache, filtercache, metrics
>         Attachments: cacheinfo.patch, fix_92_startup.patch
>
>
> It would be very helpful to be able to inspect the contents of the 
> filterCache.
> I'd like to be able to query something like
> {{/admin/caches?type=filter&nentries=1000&sort=numHits+DESC}}
> nentries would be allowed to be -1 to get everything.
> It would be nice to see these data items for each entry. I don't know which 
> are available, but I'm thinking blue sky here:
>  * cache key, exactly as stored
>  * Timestamp when the entry was inserted
>  * Whether the insertion of the entry evicted another entry, and if so which 
> one
>  * Timestamp of when this entry was last hit
>  * Number of hits on this entry forever
>  * Number of hits on this entry over some time period
>  * Number of documents matched by the filter
>  * Number of bytes of memory used by the filter
> These are the sorts of questions I'd like to be able answer:
>  * "I just did a query that I expect will have added a cache entry. Did it?"
>  * "Are my queries hitting existing cache entries?"
>  * "How big should I set my filterCache size? Should I limit it by number of 
> entries or RAM usage?"
>  * "Which of my FQs are getting used the most? These are the ones I want in 
> my firstSearcher queries." (I currently determine this by processing my old 
> solr logs)
>  * "Which filters give me the most bang for the buck in terms of RAM usage?"
>  * "I have filter X and filter Y, but would it be beneficial if I made a 
> filter X AND Y?"
>  * "Which FQs are used more at certain times of the day? (Assuming I take 
> regular snapshots throughout the day)"
> I imagine a response might look like:
> {{{}}
> {{  "responseHeader": {}}
> {{    "status": 0,}}
> {{    "QTime": 961}}
> {{  },}}
> {{  "response": {}}
> {{    "numFound": 12104,}}
> {{    "filterCacheKeys": {}}
> {{      [}}
> {{        "language:eng": {}}
> {{          "inserted": "2021-12-04T07:34:16Z",}}
> {{          "lastHit": "2021-12-04T18:17:43Z",}}
> {{          "numHits": 15065,}}
> {{          "numHitsInPastHour": 2319,}}
> {{          "evictedKey": "agelevel:4 shippable:Y",}}
> {{          "numRecordsMatchedByFilter": 24328753,}}
> {{          "bytesUsed": 3041094}}
> {{        }}}
> {{      ],}}
> {{      [}}
> {{        "is_set:N": {}}
> {{          ...}}
> {{        }}}
> {{      ],}}
> {{      [}}
> {{        "language:spa": {}}
> {{          ...}}
> {{        }}}
> {{      ]}}
> {{    }}}
> {{}}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to