Are these documents ties, with the exact same scores? Those can be ordered differently on different replicas. Using global IDF won’t fix that, plus that was 10x slower when we tried it.
We fixed this by adding a sort by score, then id. The id is the same on all replicas, so that gives consistent ordering. Exact score ties are common with one word queries and short documents, like book or movie titles. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 11, 2023, at 4:10 AM, Peter Lancaster <peter.lancas...@findmypast.com> > wrote: > > Hi Mikhail, > > Thanks for the quick reply. > > Just to say we've now tried the ExactStatsCache/ ExactSharedStatsCache > options but neither seems to help with the different docCounts/scores that > are seen for different replicas. > > The link you posted looks more promising as it may solve the issue and > improve performance as well. Unfortunately it's not something that I can try > out straightaway so I can't tell you if it works right now. > > Thanks again, > Peter. > > -----Original Message----- > From: Mikhail Khludnev <m...@apache.org> > Sent: 11 January 2023 09:52 > To: users@solr.apache.org > Subject: Re: Inconsistent ordering of results > > EXTERNAL SENDER: Do not click any links or open any attachments unless you > trust the sender and know the content is safe. > > > Hello, Peter. > Why don't you use Exact*StatsCache? I always thought that they could solve > this problem. Also, I've found > https://issues.apache.org/jira/browse/SOLR-13257 about introducing > replica.base in 9.0. I'm not sure if it's a solution. > > On Wed, Jan 11, 2023 at 12:21 PM Peter Lancaster < > peter.lancas...@findmypast.com> wrote: > >> We are using solr 7.7.3 and have a collection with 20 shards each with >> 4 replicas. We use the default BM25 similarity algorithm for scoring. >> For paging through search results we would like the sort order to be >> deterministic to present consistent results and avoid skipping or >> duplicating results when paging up and down. >> >> The problem we see is that scores are different depending upon which >> replicas are hit and because scores for different documents are often >> very similar this can lead to results appearing in a different order >> for the same query. >> >> Looking at the explain output I can see that the docCount used in the >> calculation of the idf is different for different replicas and I >> assume this is because the number of deleted documents on each replica >> is not identical. Because the idf is different then slightly different >> scores result and the order of results can therefore be different. >> >> We're currently using a statsCache with the default value of >> LocalStatsCache but have tried switching to LRUStatsCache but that >> didn't seem to help, i.e. the document counts were still inconsistent. >> >> Is there an approach that we can use so that we can guarantee >> consistent ordering and still use most of/all of the BM25 scoring >> logic? Do later versions of solr help with this issue at all? >> >> Thanks for any advice. >> >> Peter Lancaster >> Software developer, Findmypast >> peter.lancas...@findmypast.com<mailto:peter.lancas...@findmypast.com> >> >> >> >> >> ________________________________ >> >> This message is private and confidential. If you have received this >> message in error, please notify us immediately by emailing >> postmas...@findmypast.com and remove it from your system. >> This email is not intended to create legally binding obligations >> unless expressly stated otherwise. We accept no liability for the >> content of this email, or for the consequences of any actions taken >> based on the information provided, unless that information is >> subsequently confirmed in writing. Any views or opinions presented in >> this email are solely those of the author and do not necessarily >> represent those of the company. We have taken reasonable precautions >> to ensure that no viruses are contained in this email, but do not >> accept any responsibility once this email has been transmitted. You >> should ensure that the email and attachments (if any) are virus free. >> We may monitor email traffic data and also the content of email using data >> loss prevention software for the purposes of data security. >> >> Findmypast >> Clerk's Court >> First Floor, 18-20 Farringdon Lane >> London >> EC1R 3AU >> >> Registered in England, no. 4369607 >> > > > -- > Sincerely yours > Mikhail Khludnev > https://gbr01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ft.me%2FMUST_SEARCH&data=05%7C01%7Cpeter.lancaster%40findmypast.com%7C06a50e9c21d44a0f75d008daf3b99a41%7C75e41e0807c2445db397039b2b54c244%7C0%7C0%7C638090275721714886%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=64W4qUtdFFgfu%2FKS%2BdzxM8q6kLoGE2%2Fvi1bCA31KX6A%3D&reserved=0 > A caveat: Cyrillic! > > ________________________________ > > This message is private and confidential. If you have received this message > in error, please notify us immediately by emailing postmas...@findmypast.com > and remove it from your system. > This email is not intended to create legally binding obligations unless > expressly stated otherwise. We accept no liability for the content of this > email, or for the consequences of any actions taken based on the information > provided, unless that information is subsequently confirmed in writing. Any > views or opinions presented in this email are solely those of the author and > do not necessarily represent those of the company. We have taken reasonable > precautions to ensure that no viruses are contained in this email, but do not > accept any responsibility once this email has been transmitted. You should > ensure that the email and attachments (if any) are virus free. We may monitor > email traffic data and also the content of email using data loss prevention > software for the purposes of data security. > > Findmypast > Clerk’s Court > First Floor, 18-20 Farringdon Lane > London > EC1R 3AU > > Registered in England, no. 4369607