Hello, I have a SolrCloud with 5 shards 2 Replicas. I tried everything back and forth with LocalStatsCache, ExactStatsCache and ExactSharedStatsCache. I saw some minor advantage between LocalStatsCache and the Exact... pieces. But as a matter of fact while showing 10 search results per page, as soon as I switched to the second page (hit 11 to 20) and forced page reload a couple of times, the results changed within the page. A result showing up as hit number 14 was listed as hit number 16 next time. And so on. Nothing reliable. Only the first page looked good. While inspecting the score I saw that there are minor changes between reloads, even with ExactStatsCache and ExactSharedStatsCache. Some more checks on the Replicas pointed out that they are never totally in sync. That means the number of docs and segment count are in sync but nothing else.
coll1_shard1_replica1: Num Docs: 53576786 Max Doc: 57506559 Deleted Docs: 3929773 Version: 135351 Master (Searching) 1616078264682 22756 Master (Replicable) 1616402397518 22844 coll1_shard1_replica2: Num Docs: 53576786 Max Doc: 57494890 Deleted Docs: 3918104 Version: 135326 Master (Searching) 1616078264683 22755 Master (Replicable) 1616402397521 22843 Only Num Docs is the same (that is why we always get the same number of hits and also the same hits) but everything else is different. I think this is why we newer get the same order of results if using ExactStatsCache or ExactSharedStatsCache. We are using CloudSolrj for loading. I did once a test and forced an optimize to the index. First commit with expungeDeletes true and then an optimize to maxSegments 1. After that everything worked fine and the results stayed in order. But some weeks later the segment numbers drifted apart and the problem was there again. I think that will never work correct. Only if replicas are totally in sync against each other it might work. Just my findings without debugging into code. Regards Bernd Am 19.03.21 um 16:15 schrieb Cameron M VandenBerg:
Hello, I am using Solr in a distributed environment where I have split my collection into parts, which I have running on different nodes. When I create each part of the collection, I set numShards and replicationFactor to 1. The query speed is most important to us, and we are not worried about load on the system. I want a Distributed IDF across all parts of the collection so I have added this line to my solrconfig.xml: <statsCache class="org.apache.solr.search.stats.ExactStatsCache" /> This seems to work about 90% of the time, but if I run the same request over and over again, sometimes I get scores with a local IDF for just one part of the collection. Here is a request example: /solr/collection1,collection2/query?q=fulltext:shark&rows=500&fl=id,url,title,score&sort=score+desc I still get documents from both collection1 and collection2, but sometimes I get scores that are the same as when I would just query collection1. I believe that it is only using the document frequency of collection one for the term in that case. Should I use a different configuration? I would like to make sure the IDF is always distributed and the same every time I run the same query. Is there any technique I could use to ensure that this happens? Thank you, Cameron VandenBerg