Hi Mikhail,

Thanks for the quick reply.

Just to say we've now tried the ExactStatsCache/ ExactSharedStatsCache options 
but neither seems to help with the different docCounts/scores that are seen for 
different replicas.

The link you posted looks more promising as it may solve the issue and improve 
performance as well. Unfortunately it's not something that I can try out 
straightaway so I can't tell you if it works right now.

Thanks again,
Peter.

-----Original Message-----
From: Mikhail Khludnev <m...@apache.org>
Sent: 11 January 2023 09:52
To: users@solr.apache.org
Subject: Re: Inconsistent ordering of results

EXTERNAL SENDER: Do not click any links or open any attachments unless you 
trust the sender and know the content is safe.


Hello, Peter.
Why don't you use Exact*StatsCache? I always thought that they could solve this 
problem. Also, I've found
 https://issues.apache.org/jira/browse/SOLR-13257 about introducing 
replica.base in 9.0. I'm not sure if it's a solution.

On Wed, Jan 11, 2023 at 12:21 PM Peter Lancaster < 
peter.lancas...@findmypast.com> wrote:

> We are using solr 7.7.3 and have a collection with 20 shards each with
> 4 replicas. We use the default BM25 similarity algorithm for scoring.
> For paging through search results we would like the sort order to be
> deterministic to present consistent results and avoid skipping or
> duplicating results when paging up and down.
>
> The problem we see is that scores are different depending upon which
> replicas are hit and because scores for different documents are often
> very similar this can lead to results appearing in a different order
> for the same query.
>
> Looking at the explain output I can see that the docCount used in the
> calculation of the idf is different for different replicas and I
> assume this is because the number of deleted documents on each replica
> is not identical. Because the idf is different then slightly different
> scores result and the order of results can therefore be different.
>
> We're currently using a statsCache with the default value of
> LocalStatsCache but have tried switching to LRUStatsCache but that
> didn't seem to help, i.e. the document counts were still inconsistent.
>
> Is there an approach that we can use so that we can guarantee
> consistent ordering and still use most of/all of the BM25 scoring
> logic? Do later versions of solr help with this issue at all?
>
> Thanks for any advice.
>
> Peter Lancaster
> Software developer, Findmypast
> peter.lancas...@findmypast.com<mailto:peter.lancas...@findmypast.com>
>
>
>
>
> ________________________________
>
> This message is private and confidential. If you have received this
> message in error, please notify us immediately by emailing
> postmas...@findmypast.com and remove it from your system.
> This email is not intended to create legally binding obligations
> unless expressly stated otherwise. We accept no liability for the
> content of this email, or for the consequences of any actions taken
> based on the information provided, unless that information is
> subsequently confirmed in writing. Any views or opinions presented in
> this email are solely those of the author and do not necessarily
> represent those of the company. We have taken reasonable precautions
> to ensure that no viruses are contained in this email, but do not
> accept any responsibility once this email has been transmitted. You
> should ensure that the email and attachments (if any) are virus free.
> We may monitor email traffic data and also the content of email using data 
> loss prevention software for the purposes of data security.
>
> Findmypast
> Clerk's Court
> First Floor, 18-20 Farringdon Lane
> London
> EC1R 3AU
>
> Registered in England, no. 4369607
>


--
Sincerely yours
Mikhail Khludnev
https://gbr01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ft.me%2FMUST_SEARCH&data=05%7C01%7Cpeter.lancaster%40findmypast.com%7C06a50e9c21d44a0f75d008daf3b99a41%7C75e41e0807c2445db397039b2b54c244%7C0%7C0%7C638090275721714886%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=64W4qUtdFFgfu%2FKS%2BdzxM8q6kLoGE2%2Fvi1bCA31KX6A%3D&reserved=0
A caveat: Cyrillic!

________________________________

This message is private and confidential. If you have received this message in 
error, please notify us immediately by emailing postmas...@findmypast.com and 
remove it from your system.
This email is not intended to create legally binding obligations unless 
expressly stated otherwise. We accept no liability for the content of this 
email, or for the consequences of any actions taken based on the information 
provided, unless that information is subsequently confirmed in writing. Any 
views or opinions presented in this email are solely those of the author and do 
not necessarily represent those of the company. We have taken reasonable 
precautions to ensure that no viruses are contained in this email, but do not 
accept any responsibility once this email has been transmitted. You should 
ensure that the email and attachments (if any) are virus free. We may monitor 
email traffic data and also the content of email using data loss prevention 
software for the purposes of data security.

Findmypast
Clerk’s Court
First Floor, 18-20 Farringdon Lane
London
EC1R 3AU

Registered in England, no. 4369607

Reply via email to