I opened https://issues.apache.org/jira/browse/LUCENE-7517.

Le ven. 21 oct. 2016 à 14:52, Robert Muir <[email protected]> a écrit :

> The problem is more than worth it. The alternative is to remove the
> optimization? I don't think being incorrect / adding leniency to tests
> is a valid option at all. In general, if we dont apply a general fix,
> it will just make more such optimizations harder: more jenkins
> failures, more deltas in tests, just a bad direction.
>
> I guess what i propose is something more like: change Scorer.score()
> to return double, and use double precision internally in all scoring
> (also similarity code).
>
> But keep it a float in e.g. ScoreDoc/TopDocs: we just "export" that to
> the user at the end. This is really best practice anyway, we shouldnt
> be storing intermediate calculations as 32-bit floats. It would just
> be a generalization of what DisjunctionSumScorer etc are already
> doing.
>
>
> On Fri, Oct 21, 2016 at 8:34 AM, Adrien Grand <[email protected]> wrote:
> > I suspect we could do something on the Scorer API indeed, eg. by giving
> > scorers a way to expose the double value of the score. However it's not
> > clear to me that this problem is worth making the Scorer API more
> complex?
> >
> > Le ven. 21 oct. 2016 à 12:37, Robert Muir <[email protected]> a écrit :
> >>
> >> But maybe the old "trick" can still be used somehow: just means using
> >> double precision internally to erase most differences? Maybe it means
> >> a change to scorer api or whatever, but still I think its a good
> >> practical solution (vs something more extreme like kahan summation). I
> >> am sure it does not work if someone has like 500k boolean clauses or
> >> for more extreme cases, but it prevents these problems for typical
> >> cases like keyword searches.
> >>
> >>
> >> On Fri, Oct 21, 2016 at 6:31 AM, Adrien Grand <[email protected]>
> wrote:
> >> > Le ven. 21 oct. 2016 à 12:20, Robert Muir <[email protected]> a écrit
> :
> >> >>
> >> >> What changed?
> >> >
> >> >
> >> > The issue here is ReqOptSumScorer, which computes the score of the
> MUST
> >> > and
> >> > SHOULD clauses separately and then sum them up. In that test case, in
> >> > one
> >> > case body:d is in the list of SHOULD clauses, and in the other case it
> >> > is in
> >> > the list of MUST clauses.
> >> >
> >> > For the same reason, "+a b", "+a +b" and "a +b" may return different
> >> > scores
> >> > on the same documents.
> >> >
> >> > I can undo the change if you think this is a blocker, but that would
> be
> >> > disappointing as it would mean that we cannot do other exciting
> changes
> >> > like
> >> > flattening nested disjunctions since it would cause the same problem.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to