Searched a little bit more
https://issues.apache.org/jira/browse/SOLR-13790?focusedCommentId=16942908&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16942908
https://stackoverflow.com/questions/55582874/exactstatscache-not-working-for-distributed-idf


On Wed, Jan 11, 2023 at 3:12 PM Peter Lancaster <
peter.lancas...@findmypast.com> wrote:

> Hi Mikhail,
>
> Thanks for the quick reply.
>
> Just to say we've now tried the ExactStatsCache/ ExactSharedStatsCache
> options but neither seems to help with the different docCounts/scores that
> are seen for different replicas.
>
> The link you posted looks more promising as it may solve the issue and
> improve performance as well. Unfortunately it's not something that I can
> try out straightaway so I can't tell you if it works right now.
>
> Thanks again,
> Peter.
>
> -----Original Message-----
> From: Mikhail Khludnev <m...@apache.org>
> Sent: 11 January 2023 09:52
> To: users@solr.apache.org
> Subject: Re: Inconsistent ordering of results
>
> EXTERNAL SENDER: Do not click any links or open any attachments unless you
> trust the sender and know the content is safe.
>
>
> Hello, Peter.
> Why don't you use Exact*StatsCache? I always thought that they could solve
> this problem. Also, I've found
>  https://issues.apache.org/jira/browse/SOLR-13257 about introducing
> replica.base in 9.0. I'm not sure if it's a solution.
>
> On Wed, Jan 11, 2023 at 12:21 PM Peter Lancaster <
> peter.lancas...@findmypast.com> wrote:
>
> > We are using solr 7.7.3 and have a collection with 20 shards each with
> > 4 replicas. We use the default BM25 similarity algorithm for scoring.
> > For paging through search results we would like the sort order to be
> > deterministic to present consistent results and avoid skipping or
> > duplicating results when paging up and down.
> >
> > The problem we see is that scores are different depending upon which
> > replicas are hit and because scores for different documents are often
> > very similar this can lead to results appearing in a different order
> > for the same query.
> >
> > Looking at the explain output I can see that the docCount used in the
> > calculation of the idf is different for different replicas and I
> > assume this is because the number of deleted documents on each replica
> > is not identical. Because the idf is different then slightly different
> > scores result and the order of results can therefore be different.
> >
> > We're currently using a statsCache with the default value of
> > LocalStatsCache but have tried switching to LRUStatsCache but that
> > didn't seem to help, i.e. the document counts were still inconsistent.
> >
> > Is there an approach that we can use so that we can guarantee
> > consistent ordering and still use most of/all of the BM25 scoring
> > logic? Do later versions of solr help with this issue at all?
> >
> > Thanks for any advice.
> >
> > Peter Lancaster
> > Software developer, Findmypast
> > peter.lancas...@findmypast.com<mailto:peter.lancas...@findmypast.com>
> >
> >
> >
> >
> > ________________________________
> >
> > This message is private and confidential. If you have received this
> > message in error, please notify us immediately by emailing
> > postmas...@findmypast.com and remove it from your system.
> > This email is not intended to create legally binding obligations
> > unless expressly stated otherwise. We accept no liability for the
> > content of this email, or for the consequences of any actions taken
> > based on the information provided, unless that information is
> > subsequently confirmed in writing. Any views or opinions presented in
> > this email are solely those of the author and do not necessarily
> > represent those of the company. We have taken reasonable precautions
> > to ensure that no viruses are contained in this email, but do not
> > accept any responsibility once this email has been transmitted. You
> > should ensure that the email and attachments (if any) are virus free.
> > We may monitor email traffic data and also the content of email using
> data loss prevention software for the purposes of data security.
> >
> > Findmypast
> > Clerk's Court
> > First Floor, 18-20 Farringdon Lane
> > London
> > EC1R 3AU
> >
> > Registered in England, no. 4369607
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>
> https://gbr01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ft.me%2FMUST_SEARCH&data=05%7C01%7Cpeter.lancaster%40findmypast.com%7C06a50e9c21d44a0f75d008daf3b99a41%7C75e41e0807c2445db397039b2b54c244%7C0%7C0%7C638090275721714886%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=64W4qUtdFFgfu%2FKS%2BdzxM8q6kLoGE2%2Fvi1bCA31KX6A%3D&reserved=0
> A caveat: Cyrillic!
>
> ________________________________
>
> This message is private and confidential. If you have received this
> message in error, please notify us immediately by emailing
> postmas...@findmypast.com and remove it from your system.
> This email is not intended to create legally binding obligations unless
> expressly stated otherwise. We accept no liability for the content of this
> email, or for the consequences of any actions taken based on the
> information provided, unless that information is subsequently confirmed in
> writing. Any views or opinions presented in this email are solely those of
> the author and do not necessarily represent those of the company. We have
> taken reasonable precautions to ensure that no viruses are contained in
> this email, but do not accept any responsibility once this email has been
> transmitted. You should ensure that the email and attachments (if any) are
> virus free. We may monitor email traffic data and also the content of email
> using data loss prevention software for the purposes of data security.
>
> Findmypast
> Clerk’s Court
> First Floor, 18-20 Farringdon Lane
> London
> EC1R 3AU
>
> Registered in England, no. 4369607
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Reply via email to