Hello,

I am using Solr in a distributed environment where I have split my collection 
into parts, which I have running on different nodes.  When I create each part 
of the collection, I set numShards and replicationFactor to 1.  The query speed 
is most important to us, and we are not worried about load on the system.

I want a Distributed IDF across all parts of the collection so I have added 
this line to my solrconfig.xml:
<statsCache class="org.apache.solr.search.stats.ExactStatsCache" />

This seems to work about 90% of the time, but if I run the same request over 
and over again, sometimes I get scores with a local IDF for just one part of 
the collection.  Here is a request example:
/solr/collection1,collection2/query?q=fulltext:shark&rows=500&fl=id,url,title,score&sort=score+desc

I still get documents from both collection1 and collection2, but sometimes I 
get scores that are the same as when I would just query collection1.  I believe 
that it is only using the document frequency of collection one for the term in 
that case.

Should I use a different configuration?  I would like to make sure the IDF is 
always distributed and the same every time I run the same query.  Is there any 
technique I could use to ensure that this happens?

Thank you,
Cameron VandenBerg

Reply via email to