Dear Mark, I found an excellent resource on calculating the filterCache size on the solr-user mailing list archives[0]. In 2014 one user wrote an article detailing their testing extensively[1]. Basically, the filterCache can end up using many gigabytes of the JVM's heap depending on how many documents are in your core. There is a formula for approximating this memory usage, where "maxDoc" value comes from the core's overview page in the Solr admin UI:
((maxDoc/8) + 128) * (size_defined_in_solrconfig.xml) According to the conversation on the solr-user mailing list the resulting value is how many bytes *each entry in the filterCache* can use(!). In our care, we have 159,000,000 documents[2] in our statistics core, so each entry in the filterCache would use about: ((149374568/8) + 128) * 512 = 9560037888 bytes (8.9 GB) It's almost unfathomable—I don't even want to think about a larger filterCache with the current state of our statistics core! So I think we can leave this at 512 for now! At the very least you could examine the hit rate of the cache in the core's plugin/stats page in the Solr admin UI. Cheers, [0] http://lucene.472066.n3.nabble.com/Calculating-filterCache-size-td4142526.html [1] https://docs.google.com/document/d/1vl-nmlprSULvNZKQNrqp65eLnLhG9s_ydXQtg9iML10/edit [2] The yearly Solr core sharding was broken for some time, so our core is *huge*. I think this was fixed in DSpace 5.7, but we were stuck on DSpace 5.5 until recently so I haven't tried to shard yet. Regards, On Thu, Sep 20, 2018 at 11:10 AM Alan Orth <[email protected]> wrote: > Dear Mark, > > I see that you can monitor cache status and evictions in the Solr UI. As > far as I understand the filterCache, it will store the results of 512 > queries where filter query (fq) was used. For example, if I want to see how > many views a particular item has: > > > http://localhost:3000/solr/statistics/select?q=*:*&fq=owningItem:11576&fq=statistics_type=view&fq=isBot:false&rows=0&wt=json&indent=true > > This selects all documents and filters the results by owningItem, > statistics_type, and isBot. So because I used three `fq` parameters there > would be three entries in the filterCache. On a related note, I've read > that if you combine filters into one parameter with AND or OR like > `fq=owningItem:11576+AND+isBot:false` then there would be only one entry in > the filterCache. > > Logically it's easy to see that the default of 512 is very conservative. A > site with only a few hundred items and a few hundred visits per day would > easily fill this cache with the Solr queries generated by the Discovery > sidebar facets and searches. This is definitely worth investigating and > testing more. > > Resources: > - http://yonik.com/advanced-filter-caching-in-solr/ > - > http://blog.florian-hopf.de/2014/05/solr-cache-sizes-eclipse-memory-analyzer.html > > On Wed, Sep 19, 2018 at 3:40 PM Mark H. Wood <[email protected]> wrote: > >> An interesting question. How would one measure the actual cache >> effectiveness? It seems to me that changes to this sort of thing would be >> difficult to judge by simply oberving overall performance. >> >> -- >> All messages to this mailing list should adhere to the DuraSpace Code of >> Conduct: https://duraspace.org/about/policies/code-of-conduct/ >> --- >> You received this message because you are subscribed to the Google Groups >> "DSpace Technical Support" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/dspace-tech. >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > Alan Orth > [email protected] > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > "In heaven all the interesting people are missing." ―Friedrich Nietzsche > -- Alan Orth [email protected] https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch "In heaven all the interesting people are missing." ―Friedrich Nietzsche -- All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/ --- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/dspace-tech. For more options, visit https://groups.google.com/d/optout.
