Hello Peter, We had the same problem many years ago, replica's of the same shard having different stats. It was solved by introducing ExactStatsCache, but it was a little bit more slower, bit not too much. When Solr introduced new replica types we switched all shards from NRT, to TLOG. TLOG replica's are similar to old master/slave type replica's, they copy over whole segments.
Consider using the TLOG type replica. Regards, Markus Op wo 11 jan. 2023 om 12:33 schreef Mikhail Khludnev <m...@apache.org>: > Searched a little bit more > > https://issues.apache.org/jira/browse/SOLR-13790?focusedCommentId=16942908&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16942908 > > https://stackoverflow.com/questions/55582874/exactstatscache-not-working-for-distributed-idf > > > On Wed, Jan 11, 2023 at 3:12 PM Peter Lancaster < > peter.lancas...@findmypast.com> wrote: > > > Hi Mikhail, > > > > Thanks for the quick reply. > > > > Just to say we've now tried the ExactStatsCache/ ExactSharedStatsCache > > options but neither seems to help with the different docCounts/scores > that > > are seen for different replicas. > > > > The link you posted looks more promising as it may solve the issue and > > improve performance as well. Unfortunately it's not something that I can > > try out straightaway so I can't tell you if it works right now. > > > > Thanks again, > > Peter. > > > > -----Original Message----- > > From: Mikhail Khludnev <m...@apache.org> > > Sent: 11 January 2023 09:52 > > To: users@solr.apache.org > > Subject: Re: Inconsistent ordering of results > > > > EXTERNAL SENDER: Do not click any links or open any attachments unless > you > > trust the sender and know the content is safe. > > > > > > Hello, Peter. > > Why don't you use Exact*StatsCache? I always thought that they could > solve > > this problem. Also, I've found > > https://issues.apache.org/jira/browse/SOLR-13257 about introducing > > replica.base in 9.0. I'm not sure if it's a solution. > > > > On Wed, Jan 11, 2023 at 12:21 PM Peter Lancaster < > > peter.lancas...@findmypast.com> wrote: > > > > > We are using solr 7.7.3 and have a collection with 20 shards each with > > > 4 replicas. We use the default BM25 similarity algorithm for scoring. > > > For paging through search results we would like the sort order to be > > > deterministic to present consistent results and avoid skipping or > > > duplicating results when paging up and down. > > > > > > The problem we see is that scores are different depending upon which > > > replicas are hit and because scores for different documents are often > > > very similar this can lead to results appearing in a different order > > > for the same query. > > > > > > Looking at the explain output I can see that the docCount used in the > > > calculation of the idf is different for different replicas and I > > > assume this is because the number of deleted documents on each replica > > > is not identical. Because the idf is different then slightly different > > > scores result and the order of results can therefore be different. > > > > > > We're currently using a statsCache with the default value of > > > LocalStatsCache but have tried switching to LRUStatsCache but that > > > didn't seem to help, i.e. the document counts were still inconsistent. > > > > > > Is there an approach that we can use so that we can guarantee > > > consistent ordering and still use most of/all of the BM25 scoring > > > logic? Do later versions of solr help with this issue at all? > > > > > > Thanks for any advice. > > > > > > Peter Lancaster > > > Software developer, Findmypast > > > peter.lancas...@findmypast.com<mailto:peter.lancas...@findmypast.com> > > > > > > > > > > > > > > > ________________________________ > > > > > > This message is private and confidential. If you have received this > > > message in error, please notify us immediately by emailing > > > postmas...@findmypast.com and remove it from your system. > > > This email is not intended to create legally binding obligations > > > unless expressly stated otherwise. We accept no liability for the > > > content of this email, or for the consequences of any actions taken > > > based on the information provided, unless that information is > > > subsequently confirmed in writing. Any views or opinions presented in > > > this email are solely those of the author and do not necessarily > > > represent those of the company. We have taken reasonable precautions > > > to ensure that no viruses are contained in this email, but do not > > > accept any responsibility once this email has been transmitted. You > > > should ensure that the email and attachments (if any) are virus free. > > > We may monitor email traffic data and also the content of email using > > data loss prevention software for the purposes of data security. > > > > > > Findmypast > > > Clerk's Court > > > First Floor, 18-20 Farringdon Lane > > > London > > > EC1R 3AU > > > > > > Registered in England, no. 4369607 > > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > > > > https://gbr01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ft.me%2FMUST_SEARCH&data=05%7C01%7Cpeter.lancaster%40findmypast.com%7C06a50e9c21d44a0f75d008daf3b99a41%7C75e41e0807c2445db397039b2b54c244%7C0%7C0%7C638090275721714886%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=64W4qUtdFFgfu%2FKS%2BdzxM8q6kLoGE2%2Fvi1bCA31KX6A%3D&reserved=0 > > A caveat: Cyrillic! > > > > ________________________________ > > > > This message is private and confidential. If you have received this > > message in error, please notify us immediately by emailing > > postmas...@findmypast.com and remove it from your system. > > This email is not intended to create legally binding obligations unless > > expressly stated otherwise. We accept no liability for the content of > this > > email, or for the consequences of any actions taken based on the > > information provided, unless that information is subsequently confirmed > in > > writing. Any views or opinions presented in this email are solely those > of > > the author and do not necessarily represent those of the company. We have > > taken reasonable precautions to ensure that no viruses are contained in > > this email, but do not accept any responsibility once this email has been > > transmitted. You should ensure that the email and attachments (if any) > are > > virus free. We may monitor email traffic data and also the content of > email > > using data loss prevention software for the purposes of data security. > > > > Findmypast > > Clerkâs Court > > First Floor, 18-20 Farringdon Lane > > London > > EC1R 3AU > > > > Registered in England, no. 4369607 > > > > > -- > Sincerely yours > Mikhail Khludnev > https://t.me/MUST_SEARCH > A caveat: Cyrillic! >