We are using solr 7.7.3 and have a collection with 20 shards each with 4 
replicas. We use the default BM25 similarity algorithm for scoring. For paging 
through search results we would like the sort order to be deterministic to 
present consistent results and avoid skipping or duplicating results when 
paging up and down.

The problem we see is that scores are different depending upon which replicas 
are hit and because scores for different documents are often very similar this 
can lead to results appearing in a different order for the same query.

Looking at the explain output I can see that the docCount used in the 
calculation of the idf is different for different replicas and I assume this is 
because the number of deleted documents on each replica is not identical. 
Because the idf is different then slightly different scores result and the 
order of results can therefore be different.

We're currently using a statsCache with the default value of LocalStatsCache 
but have tried switching to LRUStatsCache but that didn't seem to help, i.e. 
the document counts were still inconsistent.

Is there an approach that we can use so that we can guarantee consistent 
ordering and still use most of/all of the BM25 scoring logic? Do later versions 
of solr help with this issue at all?

Thanks for any advice.

Peter Lancaster
Software developer, Findmypast
peter.lancas...@findmypast.com<mailto:peter.lancas...@findmypast.com>




________________________________

This message is private and confidential. If you have received this message in 
error, please notify us immediately by emailing postmas...@findmypast.com and 
remove it from your system.
This email is not intended to create legally binding obligations unless 
expressly stated otherwise. We accept no liability for the content of this 
email, or for the consequences of any actions taken based on the information 
provided, unless that information is subsequently confirmed in writing. Any 
views or opinions presented in this email are solely those of the author and do 
not necessarily represent those of the company. We have taken reasonable 
precautions to ensure that no viruses are contained in this email, but do not 
accept any responsibility once this email has been transmitted. You should 
ensure that the email and attachments (if any) are virus free. We may monitor 
email traffic data and also the content of email using data loss prevention 
software for the purposes of data security.

Findmypast
Clerk's Court
First Floor, 18-20 Farringdon Lane
London
EC1R 3AU

Registered in England, no. 4369607

Reply via email to