Cameron,
What is your cluster configuration? i.e., how many nodes, how many replicas
per node, how many replicas in each collection, etc.? Do you observe
consistent behavior for the same query if you always route that query via
the same "entry node" (i.e., not load balanced over the cluster)?
Michael

On Fri, Mar 19, 2021 at 11:16 AM Cameron M VandenBerg <c...@cs.cmu.edu>
wrote:

> Hello,
>
> I am using Solr in a distributed environment where I have split my
> collection into parts, which I have running on different nodes.  When I
> create each part of the collection, I set numShards and replicationFactor
> to 1.  The query speed is most important to us, and we are not worried
> about load on the system.
>
> I want a Distributed IDF across all parts of the collection so I have
> added this line to my solrconfig.xml:
> <statsCache class="org.apache.solr.search.stats.ExactStatsCache" />
>
> This seems to work about 90% of the time, but if I run the same request
> over and over again, sometimes I get scores with a local IDF for just one
> part of the collection.  Here is a request example:
>
> /solr/collection1,collection2/query?q=fulltext:shark&rows=500&fl=id,url,title,score&sort=score+desc
>
> I still get documents from both collection1 and collection2, but sometimes
> I get scores that are the same as when I would just query collection1.  I
> believe that it is only using the document frequency of collection one for
> the term in that case.
>
> Should I use a different configuration?  I would like to make sure the IDF
> is always distributed and the same every time I run the same query.  Is
> there any technique I could use to ensure that this happens?
>
> Thank you,
> Cameron VandenBerg
>
>

Reply via email to