We are using solr 7.7.3 and have a collection with 20 shards each with 4 replicas. We use the default BM25 similarity algorithm for scoring. For paging through search results we would like the sort order to be deterministic to present consistent results and avoid skipping or duplicating results when paging up and down.
The problem we see is that scores are different depending upon which replicas are hit and because scores for different documents are often very similar this can lead to results appearing in a different order for the same query. Looking at the explain output I can see that the docCount used in the calculation of the idf is different for different replicas and I assume this is because the number of deleted documents on each replica is not identical. Because the idf is different then slightly different scores result and the order of results can therefore be different. We're currently using a statsCache with the default value of LocalStatsCache but have tried switching to LRUStatsCache but that didn't seem to help, i.e. the document counts were still inconsistent. Is there an approach that we can use so that we can guarantee consistent ordering and still use most of/all of the BM25 scoring logic? Do later versions of solr help with this issue at all? Thanks for any advice. Peter Lancaster Software developer, Findmypast peter.lancas...@findmypast.com<mailto:peter.lancas...@findmypast.com> ________________________________ This message is private and confidential. If you have received this message in error, please notify us immediately by emailing postmas...@findmypast.com and remove it from your system. This email is not intended to create legally binding obligations unless expressly stated otherwise. We accept no liability for the content of this email, or for the consequences of any actions taken based on the information provided, unless that information is subsequently confirmed in writing. Any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. We have taken reasonable precautions to ensure that no viruses are contained in this email, but do not accept any responsibility once this email has been transmitted. You should ensure that the email and attachments (if any) are virus free. We may monitor email traffic data and also the content of email using data loss prevention software for the purposes of data security. Findmypast Clerk's Court First Floor, 18-20 Farringdon Lane London EC1R 3AU Registered in England, no. 4369607