Hi, You may want to ask this question in the Solr Users mailing list instead of this one which is dedicated to the Lucene Java library - https://solr.apache.org/community.html#mailing-lists-chat <https://solr.apache.org/community.html#mailing-lists-chat>
Jan > 16. mar. 2021 kl. 20:55 skrev Cameron M VandenBerg <c...@cs.cmu.edu>: > > Hello, > > I am using Solr in a distributed environment where I have split my collection > into parts, which I have running on different nodes. When I create each part > of the collection, I set numShards and replicationFactor to 1. The query > speed is most important to us, and we are not worried about load on the > system. > > I want a Distributed IDF across all parts of the collection so I have added > this line to my solrconfig.xml: > <statsCache class="org.apache.solr.search.stats.ExactStatsCache" /> > > This seems to work about 90% of the time, but if I run the same request over > and over again, sometimes I get scores with a local IDF for just one part of > the collection. Here is a request example: > /solr/collection1,collection2/query?q=fulltext:shark&rows=500&fl=id,url,title,score&sort=score+desc > > I still get documents from both collection1 and collection2, but sometimes I > get scores that are the same as when I would just query collection1. I > believe that it is only using the document frequency of collection one for > the term in that case. > > Should I use a different configuration? I would like to make sure the IDF is > always distributed and the same every time I run the same query. Is there > any technique I could use to ensure that this happens? > > Thank you, > Cameron VandenBerg