Hello, Peter. Why don't you use Exact*StatsCache? I always thought that they could solve this problem. Also, I've found https://issues.apache.org/jira/browse/SOLR-13257 about introducing replica.base in 9.0. I'm not sure if it's a solution.
On Wed, Jan 11, 2023 at 12:21 PM Peter Lancaster < peter.lancas...@findmypast.com> wrote: > We are using solr 7.7.3 and have a collection with 20 shards each with 4 > replicas. We use the default BM25 similarity algorithm for scoring. For > paging through search results we would like the sort order to be > deterministic to present consistent results and avoid skipping or > duplicating results when paging up and down. > > The problem we see is that scores are different depending upon which > replicas are hit and because scores for different documents are often very > similar this can lead to results appearing in a different order for the > same query. > > Looking at the explain output I can see that the docCount used in the > calculation of the idf is different for different replicas and I assume > this is because the number of deleted documents on each replica is not > identical. Because the idf is different then slightly different scores > result and the order of results can therefore be different. > > We're currently using a statsCache with the default value of > LocalStatsCache but have tried switching to LRUStatsCache but that didn't > seem to help, i.e. the document counts were still inconsistent. > > Is there an approach that we can use so that we can guarantee consistent > ordering and still use most of/all of the BM25 scoring logic? Do later > versions of solr help with this issue at all? > > Thanks for any advice. > > Peter Lancaster > Software developer, Findmypast > peter.lancas...@findmypast.com<mailto:peter.lancas...@findmypast.com> > > > > > ________________________________ > > This message is private and confidential. If you have received this > message in error, please notify us immediately by emailing > postmas...@findmypast.com and remove it from your system. > This email is not intended to create legally binding obligations unless > expressly stated otherwise. We accept no liability for the content of this > email, or for the consequences of any actions taken based on the > information provided, unless that information is subsequently confirmed in > writing. Any views or opinions presented in this email are solely those of > the author and do not necessarily represent those of the company. We have > taken reasonable precautions to ensure that no viruses are contained in > this email, but do not accept any responsibility once this email has been > transmitted. You should ensure that the email and attachments (if any) are > virus free. We may monitor email traffic data and also the content of email > using data loss prevention software for the purposes of data security. > > Findmypast > Clerk's Court > First Floor, 18-20 Farringdon Lane > London > EC1R 3AU > > Registered in England, no. 4369607 > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!