Searched a little bit more https://issues.apache.org/jira/browse/SOLR-13790?focusedCommentId=16942908&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16942908 https://stackoverflow.com/questions/55582874/exactstatscache-not-working-for-distributed-idf
On Wed, Jan 11, 2023 at 3:12 PM Peter Lancaster < peter.lancas...@findmypast.com> wrote: > Hi Mikhail, > > Thanks for the quick reply. > > Just to say we've now tried the ExactStatsCache/ ExactSharedStatsCache > options but neither seems to help with the different docCounts/scores that > are seen for different replicas. > > The link you posted looks more promising as it may solve the issue and > improve performance as well. Unfortunately it's not something that I can > try out straightaway so I can't tell you if it works right now. > > Thanks again, > Peter. > > -----Original Message----- > From: Mikhail Khludnev <m...@apache.org> > Sent: 11 January 2023 09:52 > To: users@solr.apache.org > Subject: Re: Inconsistent ordering of results > > EXTERNAL SENDER: Do not click any links or open any attachments unless you > trust the sender and know the content is safe. > > > Hello, Peter. > Why don't you use Exact*StatsCache? I always thought that they could solve > this problem. Also, I've found > https://issues.apache.org/jira/browse/SOLR-13257 about introducing > replica.base in 9.0. I'm not sure if it's a solution. > > On Wed, Jan 11, 2023 at 12:21 PM Peter Lancaster < > peter.lancas...@findmypast.com> wrote: > > > We are using solr 7.7.3 and have a collection with 20 shards each with > > 4 replicas. We use the default BM25 similarity algorithm for scoring. > > For paging through search results we would like the sort order to be > > deterministic to present consistent results and avoid skipping or > > duplicating results when paging up and down. > > > > The problem we see is that scores are different depending upon which > > replicas are hit and because scores for different documents are often > > very similar this can lead to results appearing in a different order > > for the same query. > > > > Looking at the explain output I can see that the docCount used in the > > calculation of the idf is different for different replicas and I > > assume this is because the number of deleted documents on each replica > > is not identical. Because the idf is different then slightly different > > scores result and the order of results can therefore be different. > > > > We're currently using a statsCache with the default value of > > LocalStatsCache but have tried switching to LRUStatsCache but that > > didn't seem to help, i.e. the document counts were still inconsistent. > > > > Is there an approach that we can use so that we can guarantee > > consistent ordering and still use most of/all of the BM25 scoring > > logic? Do later versions of solr help with this issue at all? > > > > Thanks for any advice. > > > > Peter Lancaster > > Software developer, Findmypast > > peter.lancas...@findmypast.com<mailto:peter.lancas...@findmypast.com> > > > > > > > > > > ________________________________ > > > > This message is private and confidential. If you have received this > > message in error, please notify us immediately by emailing > > postmas...@findmypast.com and remove it from your system. > > This email is not intended to create legally binding obligations > > unless expressly stated otherwise. We accept no liability for the > > content of this email, or for the consequences of any actions taken > > based on the information provided, unless that information is > > subsequently confirmed in writing. Any views or opinions presented in > > this email are solely those of the author and do not necessarily > > represent those of the company. We have taken reasonable precautions > > to ensure that no viruses are contained in this email, but do not > > accept any responsibility once this email has been transmitted. You > > should ensure that the email and attachments (if any) are virus free. > > We may monitor email traffic data and also the content of email using > data loss prevention software for the purposes of data security. > > > > Findmypast > > Clerk's Court > > First Floor, 18-20 Farringdon Lane > > London > > EC1R 3AU > > > > Registered in England, no. 4369607 > > > > > -- > Sincerely yours > Mikhail Khludnev > > https://gbr01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ft.me%2FMUST_SEARCH&data=05%7C01%7Cpeter.lancaster%40findmypast.com%7C06a50e9c21d44a0f75d008daf3b99a41%7C75e41e0807c2445db397039b2b54c244%7C0%7C0%7C638090275721714886%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=64W4qUtdFFgfu%2FKS%2BdzxM8q6kLoGE2%2Fvi1bCA31KX6A%3D&reserved=0 > A caveat: Cyrillic! > > ________________________________ > > This message is private and confidential. If you have received this > message in error, please notify us immediately by emailing > postmas...@findmypast.com and remove it from your system. > This email is not intended to create legally binding obligations unless > expressly stated otherwise. We accept no liability for the content of this > email, or for the consequences of any actions taken based on the > information provided, unless that information is subsequently confirmed in > writing. Any views or opinions presented in this email are solely those of > the author and do not necessarily represent those of the company. We have > taken reasonable precautions to ensure that no viruses are contained in > this email, but do not accept any responsibility once this email has been > transmitted. You should ensure that the email and attachments (if any) are > virus free. We may monitor email traffic data and also the content of email > using data loss prevention software for the purposes of data security. > > Findmypast > Clerkâs Court > First Floor, 18-20 Farringdon Lane > London > EC1R 3AU > > Registered in England, no. 4369607 > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!