Hi Markus, Thank-you for your response.
I had forgotten to include that we are using NRT rather than TLOG replicas. It sounds like switching to TLOGS is exactly the right thing to do to fix this. Thanks again for your help. Peter. -----Original Message----- From: Markus Jelsma <markus.jel...@openindex.io> Sent: 11 January 2023 12:58 To: users@solr.apache.org Subject: Re: Inconsistent ordering of results EXTERNAL SENDER: Do not click any links or open any attachments unless you trust the sender and know the content is safe. Hello Peter, We had the same problem many years ago, replica's of the same shard having different stats. It was solved by introducing ExactStatsCache, but it was a little bit more slower, bit not too much. When Solr introduced new replica types we switched all shards from NRT, to TLOG. TLOG replica's are similar to old master/slave type replica's, they copy over whole segments. Consider using the TLOG type replica. Regards, Markus Op wo 11 jan. 2023 om 12:33 schreef Mikhail Khludnev <m...@apache.org>: > Searched a little bit more > > https://gbr01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissu > es.apache.org%2Fjira%2Fbrowse%2FSOLR-13790%3FfocusedCommentId%3D169429 > 08%26page%3Dcom.atlassian.jira.plugin.system.issuetabpanels%253Acommen > t-tabpanel%23comment-16942908&data=05%7C01%7Cpeter.lancaster%40findmyp > ast.com%7Cb76f23a810354cb1035808daf3d39481%7C75e41e0807c2445db397039b2 > b54c244%7C0%7C0%7C638090387303370095%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiM > C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C > %7C&sdata=msSFvkEWH4iVeZ2cyx6KPQP1%2FEZzfZjJhfOE%2B60%2BYws%3D&reserve > d=0 > > https://gbr01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstac > koverflow.com%2Fquestions%2F55582874%2Fexactstatscache-not-working-for > -distributed-idf&data=05%7C01%7Cpeter.lancaster%40findmypast.com%7Cb76 > f23a810354cb1035808daf3d39481%7C75e41e0807c2445db397039b2b54c244%7C0%7 > C0%7C638090387303370095%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLC > JQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=SnR > HdORxFy0jiyVynoreOe3j8dfx1N8S2482mTm%2BYws%3D&reserved=0 > > > On Wed, Jan 11, 2023 at 3:12 PM Peter Lancaster < > peter.lancas...@findmypast.com> wrote: > > > Hi Mikhail, > > > > Thanks for the quick reply. > > > > Just to say we've now tried the ExactStatsCache/ > > ExactSharedStatsCache options but neither seems to help with the > > different docCounts/scores > that > > are seen for different replicas. > > > > The link you posted looks more promising as it may solve the issue > > and improve performance as well. Unfortunately it's not something > > that I can try out straightaway so I can't tell you if it works right now. > > > > Thanks again, > > Peter. > > > > -----Original Message----- > > From: Mikhail Khludnev <m...@apache.org> > > Sent: 11 January 2023 09:52 > > To: users@solr.apache.org > > Subject: Re: Inconsistent ordering of results > > > > EXTERNAL SENDER: Do not click any links or open any attachments > > unless > you > > trust the sender and know the content is safe. > > > > > > Hello, Peter. > > Why don't you use Exact*StatsCache? I always thought that they could > solve > > this problem. Also, I've found > > > > https://gbr01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSOLR-13257&data=05%7C01%7Cpeter.lancaster%40findmypast.com%7Cb76f23a810354cb1035808daf3d39481%7C75e41e0807c2445db397039b2b54c244%7C0%7C0%7C638090387303682540%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=AdS3psXaHQvJ133%2BKxVPohh84kjCXaTiLAt0FU2WQro%3D&reserved=0 > > about introducing replica.base in 9.0. I'm not sure if it's a solution. > > > > On Wed, Jan 11, 2023 at 12:21 PM Peter Lancaster < > > peter.lancas...@findmypast.com> wrote: > > > > > We are using solr 7.7.3 and have a collection with 20 shards each > > > with > > > 4 replicas. We use the default BM25 similarity algorithm for scoring. > > > For paging through search results we would like the sort order to > > > be deterministic to present consistent results and avoid skipping > > > or duplicating results when paging up and down. > > > > > > The problem we see is that scores are different depending upon > > > which replicas are hit and because scores for different documents > > > are often very similar this can lead to results appearing in a > > > different order for the same query. > > > > > > Looking at the explain output I can see that the docCount used in > > > the calculation of the idf is different for different replicas and > > > I assume this is because the number of deleted documents on each > > > replica is not identical. Because the idf is different then > > > slightly different scores result and the order of results can therefore > > > be different. > > > > > > We're currently using a statsCache with the default value of > > > LocalStatsCache but have tried switching to LRUStatsCache but that > > > didn't seem to help, i.e. the document counts were still inconsistent. > > > > > > Is there an approach that we can use so that we can guarantee > > > consistent ordering and still use most of/all of the BM25 scoring > > > logic? Do later versions of solr help with this issue at all? > > > > > > Thanks for any advice. > > > > > > Peter Lancaster > > > Software developer, Findmypast > > > peter.lancas...@findmypast.com<mailto:peter.lancaster@findmypast.c > > > om> > > > > > > > > > > > > > > > ________________________________ > > > > > > This message is private and confidential. If you have received > > > this message in error, please notify us immediately by emailing > > > postmas...@findmypast.com and remove it from your system. > > > This email is not intended to create legally binding obligations > > > unless expressly stated otherwise. We accept no liability for the > > > content of this email, or for the consequences of any actions > > > taken based on the information provided, unless that information > > > is subsequently confirmed in writing. Any views or opinions > > > presented in this email are solely those of the author and do not > > > necessarily represent those of the company. We have taken > > > reasonable precautions to ensure that no viruses are contained in > > > this email, but do not accept any responsibility once this email > > > has been transmitted. You should ensure that the email and attachments > > > (if any) are virus free. > > > We may monitor email traffic data and also the content of email > > > using > > data loss prevention software for the purposes of data security. > > > > > > Findmypast > > > Clerk's Court > > > First Floor, 18-20 Farringdon Lane London EC1R 3AU > > > > > > Registered in England, no. 4369607 > > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > > > > https://gbr01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ft.me > %2FMUST_SEARCH&data=05%7C01%7Cpeter.lancaster%40findmypast.com%7Cb76f2 > 3a810354cb1035808daf3d39481%7C75e41e0807c2445db397039b2b54c244%7C0%7C0 > %7C638090387303682540%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ > IjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=E%2FX > SJFvFjk1X3QRHA35%2FCis964KWMKWD1%2BSH%2B9dFU3k%3D&reserved=0 > > A caveat: Cyrillic! > > > > ________________________________ > > > > This message is private and confidential. If you have received this > > message in error, please notify us immediately by emailing > > postmas...@findmypast.com and remove it from your system. > > This email is not intended to create legally binding obligations > > unless expressly stated otherwise. We accept no liability for the > > content of > this > > email, or for the consequences of any actions taken based on the > > information provided, unless that information is subsequently > > confirmed > in > > writing. Any views or opinions presented in this email are solely > > those > of > > the author and do not necessarily represent those of the company. We > > have taken reasonable precautions to ensure that no viruses are > > contained in this email, but do not accept any responsibility once > > this email has been transmitted. You should ensure that the email > > and attachments (if any) > are > > virus free. We may monitor email traffic data and also the content > > of > email > > using data loss prevention software for the purposes of data security. > > > > Findmypast > > Clerk’s Court > > First Floor, 18-20 Farringdon Lane > > London > > EC1R 3AU > > > > Registered in England, no. 4369607 > > > > > -- > Sincerely yours > Mikhail Khludnev > https://gbr01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ft.me > %2FMUST_SEARCH&data=05%7C01%7Cpeter.lancaster%40findmypast.com%7Cb76f2 > 3a810354cb1035808daf3d39481%7C75e41e0807c2445db397039b2b54c244%7C0%7C0 > %7C638090387303682540%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ > IjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=E%2FX > SJFvFjk1X3QRHA35%2FCis964KWMKWD1%2BSH%2B9dFU3k%3D&reserved=0 > A caveat: Cyrillic! > ________________________________ This message is private and confidential. If you have received this message in error, please notify us immediately by emailing postmas...@findmypast.com and remove it from your system. This email is not intended to create legally binding obligations unless expressly stated otherwise. We accept no liability for the content of this email, or for the consequences of any actions taken based on the information provided, unless that information is subsequently confirmed in writing. Any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. We have taken reasonable precautions to ensure that no viruses are contained in this email, but do not accept any responsibility once this email has been transmitted. You should ensure that the email and attachments (if any) are virus free. We may monitor email traffic data and also the content of email using data loss prevention software for the purposes of data security. Findmypast Clerk’s Court First Floor, 18-20 Farringdon Lane London EC1R 3AU Registered in England, no. 4369607