Hello, Peter.
Why don't you use Exact*StatsCache? I always thought that they could solve
this problem. Also, I've found
https://issues.apache.org/jira/browse/SOLR-13257 about introducing
replica.base in 9.0. I'm not sure if it's a solution.

On Wed, Jan 11, 2023 at 12:21 PM Peter Lancaster <
peter.lancas...@findmypast.com> wrote:

> We are using solr 7.7.3 and have a collection with 20 shards each with 4
> replicas. We use the default BM25 similarity algorithm for scoring. For
> paging through search results we would like the sort order to be
> deterministic to present consistent results and avoid skipping or
> duplicating results when paging up and down.
>
> The problem we see is that scores are different depending upon which
> replicas are hit and because scores for different documents are often very
> similar this can lead to results appearing in a different order for the
> same query.
>
> Looking at the explain output I can see that the docCount used in the
> calculation of the idf is different for different replicas and I assume
> this is because the number of deleted documents on each replica is not
> identical. Because the idf is different then slightly different scores
> result and the order of results can therefore be different.
>
> We're currently using a statsCache with the default value of
> LocalStatsCache but have tried switching to LRUStatsCache but that didn't
> seem to help, i.e. the document counts were still inconsistent.
>
> Is there an approach that we can use so that we can guarantee consistent
> ordering and still use most of/all of the BM25 scoring logic? Do later
> versions of solr help with this issue at all?
>
> Thanks for any advice.
>
> Peter Lancaster
> Software developer, Findmypast
> peter.lancas...@findmypast.com<mailto:peter.lancas...@findmypast.com>
>
>
>
>
> ________________________________
>
> This message is private and confidential. If you have received this
> message in error, please notify us immediately by emailing
> postmas...@findmypast.com and remove it from your system.
> This email is not intended to create legally binding obligations unless
> expressly stated otherwise. We accept no liability for the content of this
> email, or for the consequences of any actions taken based on the
> information provided, unless that information is subsequently confirmed in
> writing. Any views or opinions presented in this email are solely those of
> the author and do not necessarily represent those of the company. We have
> taken reasonable precautions to ensure that no viruses are contained in
> this email, but do not accept any responsibility once this email has been
> transmitted. You should ensure that the email and attachments (if any) are
> virus free. We may monitor email traffic data and also the content of email
> using data loss prevention software for the purposes of data security.
>
> Findmypast
> Clerk's Court
> First Floor, 18-20 Farringdon Lane
> London
> EC1R 3AU
>
> Registered in England, no. 4369607
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Reply via email to