Hello Peter,

We had the same problem many years ago, replica's of the same shard having
different stats. It was solved by introducing ExactStatsCache, but it was a
little bit more slower, bit not too much. When Solr introduced new replica
types we switched all shards from NRT, to TLOG. TLOG replica's are similar
to old master/slave type replica's, they copy over whole segments.

Consider using the TLOG type replica.

Regards,
Markus

Op wo 11 jan. 2023 om 12:33 schreef Mikhail Khludnev <m...@apache.org>:

> Searched a little bit more
>
> https://issues.apache.org/jira/browse/SOLR-13790?focusedCommentId=16942908&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16942908
>
> https://stackoverflow.com/questions/55582874/exactstatscache-not-working-for-distributed-idf
>
>
> On Wed, Jan 11, 2023 at 3:12 PM Peter Lancaster <
> peter.lancas...@findmypast.com> wrote:
>
> > Hi Mikhail,
> >
> > Thanks for the quick reply.
> >
> > Just to say we've now tried the ExactStatsCache/ ExactSharedStatsCache
> > options but neither seems to help with the different docCounts/scores
> that
> > are seen for different replicas.
> >
> > The link you posted looks more promising as it may solve the issue and
> > improve performance as well. Unfortunately it's not something that I can
> > try out straightaway so I can't tell you if it works right now.
> >
> > Thanks again,
> > Peter.
> >
> > -----Original Message-----
> > From: Mikhail Khludnev <m...@apache.org>
> > Sent: 11 January 2023 09:52
> > To: users@solr.apache.org
> > Subject: Re: Inconsistent ordering of results
> >
> > EXTERNAL SENDER: Do not click any links or open any attachments unless
> you
> > trust the sender and know the content is safe.
> >
> >
> > Hello, Peter.
> > Why don't you use Exact*StatsCache? I always thought that they could
> solve
> > this problem. Also, I've found
> >  https://issues.apache.org/jira/browse/SOLR-13257 about introducing
> > replica.base in 9.0. I'm not sure if it's a solution.
> >
> > On Wed, Jan 11, 2023 at 12:21 PM Peter Lancaster <
> > peter.lancas...@findmypast.com> wrote:
> >
> > > We are using solr 7.7.3 and have a collection with 20 shards each with
> > > 4 replicas. We use the default BM25 similarity algorithm for scoring.
> > > For paging through search results we would like the sort order to be
> > > deterministic to present consistent results and avoid skipping or
> > > duplicating results when paging up and down.
> > >
> > > The problem we see is that scores are different depending upon which
> > > replicas are hit and because scores for different documents are often
> > > very similar this can lead to results appearing in a different order
> > > for the same query.
> > >
> > > Looking at the explain output I can see that the docCount used in the
> > > calculation of the idf is different for different replicas and I
> > > assume this is because the number of deleted documents on each replica
> > > is not identical. Because the idf is different then slightly different
> > > scores result and the order of results can therefore be different.
> > >
> > > We're currently using a statsCache with the default value of
> > > LocalStatsCache but have tried switching to LRUStatsCache but that
> > > didn't seem to help, i.e. the document counts were still inconsistent.
> > >
> > > Is there an approach that we can use so that we can guarantee
> > > consistent ordering and still use most of/all of the BM25 scoring
> > > logic? Do later versions of solr help with this issue at all?
> > >
> > > Thanks for any advice.
> > >
> > > Peter Lancaster
> > > Software developer, Findmypast
> > > peter.lancas...@findmypast.com<mailto:peter.lancas...@findmypast.com>
> > >
> > >
> > >
> > >
> > > ________________________________
> > >
> > > This message is private and confidential. If you have received this
> > > message in error, please notify us immediately by emailing
> > > postmas...@findmypast.com and remove it from your system.
> > > This email is not intended to create legally binding obligations
> > > unless expressly stated otherwise. We accept no liability for the
> > > content of this email, or for the consequences of any actions taken
> > > based on the information provided, unless that information is
> > > subsequently confirmed in writing. Any views or opinions presented in
> > > this email are solely those of the author and do not necessarily
> > > represent those of the company. We have taken reasonable precautions
> > > to ensure that no viruses are contained in this email, but do not
> > > accept any responsibility once this email has been transmitted. You
> > > should ensure that the email and attachments (if any) are virus free.
> > > We may monitor email traffic data and also the content of email using
> > data loss prevention software for the purposes of data security.
> > >
> > > Findmypast
> > > Clerk's Court
> > > First Floor, 18-20 Farringdon Lane
> > > London
> > > EC1R 3AU
> > >
> > > Registered in England, no. 4369607
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
> >
> https://gbr01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ft.me%2FMUST_SEARCH&data=05%7C01%7Cpeter.lancaster%40findmypast.com%7C06a50e9c21d44a0f75d008daf3b99a41%7C75e41e0807c2445db397039b2b54c244%7C0%7C0%7C638090275721714886%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=64W4qUtdFFgfu%2FKS%2BdzxM8q6kLoGE2%2Fvi1bCA31KX6A%3D&reserved=0
> > A caveat: Cyrillic!
> >
> > ________________________________
> >
> > This message is private and confidential. If you have received this
> > message in error, please notify us immediately by emailing
> > postmas...@findmypast.com and remove it from your system.
> > This email is not intended to create legally binding obligations unless
> > expressly stated otherwise. We accept no liability for the content of
> this
> > email, or for the consequences of any actions taken based on the
> > information provided, unless that information is subsequently confirmed
> in
> > writing. Any views or opinions presented in this email are solely those
> of
> > the author and do not necessarily represent those of the company. We have
> > taken reasonable precautions to ensure that no viruses are contained in
> > this email, but do not accept any responsibility once this email has been
> > transmitted. You should ensure that the email and attachments (if any)
> are
> > virus free. We may monitor email traffic data and also the content of
> email
> > using data loss prevention software for the purposes of data security.
> >
> > Findmypast
> > Clerk’s Court
> > First Floor, 18-20 Farringdon Lane
> > London
> > EC1R 3AU
> >
> > Registered in England, no. 4369607
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> https://t.me/MUST_SEARCH
> A caveat: Cyrillic!
>

Reply via email to