Dear Mark,

I found an excellent resource on calculating the filterCache size on the
solr-user mailing list archives[0]. In 2014 one user wrote an article
detailing their testing extensively[1]. Basically, the filterCache can end
up using many gigabytes of the JVM's heap depending on how many documents
are in your core. There is a formula for approximating this memory usage,
where "maxDoc" value comes from the core's overview page in the Solr admin
UI:

((maxDoc/8) + 128) * (size_defined_in_solrconfig.xml)

According to the conversation on the solr-user mailing list the resulting
value is how many bytes *each entry in the filterCache* can use(!). In our
care, we have 159,000,000 documents[2] in our statistics core, so each
entry in the filterCache would use about:

((149374568/8) + 128) * 512 = 9560037888 bytes (8.9 GB)

It's almost unfathomable—I don't even want to think about a larger
filterCache with the current state of our statistics core! So I think we
can leave this at 512 for now! At the very least you could examine the hit
rate of the cache in the core's plugin/stats page in the Solr admin UI.

Cheers,

[0]
http://lucene.472066.n3.nabble.com/Calculating-filterCache-size-td4142526.html
[1]
https://docs.google.com/document/d/1vl-nmlprSULvNZKQNrqp65eLnLhG9s_ydXQtg9iML10/edit
[2] The yearly Solr core sharding was broken for some time, so our core is
*huge*. I think this was fixed in DSpace 5.7, but we were stuck on DSpace
5.5 until recently so I haven't tried to shard yet.

Regards,

On Thu, Sep 20, 2018 at 11:10 AM Alan Orth <[email protected]> wrote:

> Dear Mark,
>
> I see that you can monitor cache status and evictions in the Solr UI. As
> far as I understand the filterCache, it will store the results of 512
> queries where filter query (fq) was used. For example, if I want to see how
> many views a particular item has:
>
>
> http://localhost:3000/solr/statistics/select?q=*:*&fq=owningItem:11576&fq=statistics_type=view&fq=isBot:false&rows=0&wt=json&indent=true
>
> This selects all documents and filters the results by owningItem,
> statistics_type, and isBot. So because I used three `fq` parameters there
> would be three entries in the filterCache. On a related note, I've read
> that if you combine filters into one parameter with AND or OR like
> `fq=owningItem:11576+AND+isBot:false` then there would be only one entry in
> the filterCache.
>
> Logically it's easy to see that the default of 512 is very conservative. A
> site with only a few hundred items and a few hundred visits per day would
> easily fill this cache with the Solr queries generated by the Discovery
> sidebar facets and searches. This is definitely worth investigating and
> testing more.
>
> Resources:
> - http://yonik.com/advanced-filter-caching-in-solr/
> -
> http://blog.florian-hopf.de/2014/05/solr-cache-sizes-eclipse-memory-analyzer.html
>
> On Wed, Sep 19, 2018 at 3:40 PM Mark H. Wood <[email protected]> wrote:
>
>> An interesting question.  How would one measure the actual cache
>> effectiveness?  It seems to me that changes to this sort of thing would be
>> difficult to judge by simply oberving overall performance.
>>
>> --
>> All messages to this mailing list should adhere to the DuraSpace Code of
>> Conduct: https://duraspace.org/about/policies/code-of-conduct/
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "DSpace Technical Support" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/dspace-tech.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
> Alan Orth
> [email protected]
> https://picturingjordan.com
> https://englishbulgaria.net
> https://mjanja.ch
> "In heaven all the interesting people are missing." ―Friedrich Nietzsche
>


-- 
Alan Orth
[email protected]
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing." ―Friedrich Nietzsche

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply via email to