I opened https://issues.apache.org/jira/browse/LUCENE-7517.
Le ven. 21 oct. 2016 à 14:52, Robert Muir <[email protected]> a écrit : > The problem is more than worth it. The alternative is to remove the > optimization? I don't think being incorrect / adding leniency to tests > is a valid option at all. In general, if we dont apply a general fix, > it will just make more such optimizations harder: more jenkins > failures, more deltas in tests, just a bad direction. > > I guess what i propose is something more like: change Scorer.score() > to return double, and use double precision internally in all scoring > (also similarity code). > > But keep it a float in e.g. ScoreDoc/TopDocs: we just "export" that to > the user at the end. This is really best practice anyway, we shouldnt > be storing intermediate calculations as 32-bit floats. It would just > be a generalization of what DisjunctionSumScorer etc are already > doing. > > > On Fri, Oct 21, 2016 at 8:34 AM, Adrien Grand <[email protected]> wrote: > > I suspect we could do something on the Scorer API indeed, eg. by giving > > scorers a way to expose the double value of the score. However it's not > > clear to me that this problem is worth making the Scorer API more > complex? > > > > Le ven. 21 oct. 2016 à 12:37, Robert Muir <[email protected]> a écrit : > >> > >> But maybe the old "trick" can still be used somehow: just means using > >> double precision internally to erase most differences? Maybe it means > >> a change to scorer api or whatever, but still I think its a good > >> practical solution (vs something more extreme like kahan summation). I > >> am sure it does not work if someone has like 500k boolean clauses or > >> for more extreme cases, but it prevents these problems for typical > >> cases like keyword searches. > >> > >> > >> On Fri, Oct 21, 2016 at 6:31 AM, Adrien Grand <[email protected]> > wrote: > >> > Le ven. 21 oct. 2016 à 12:20, Robert Muir <[email protected]> a écrit > : > >> >> > >> >> What changed? > >> > > >> > > >> > The issue here is ReqOptSumScorer, which computes the score of the > MUST > >> > and > >> > SHOULD clauses separately and then sum them up. In that test case, in > >> > one > >> > case body:d is in the list of SHOULD clauses, and in the other case it > >> > is in > >> > the list of MUST clauses. > >> > > >> > For the same reason, "+a b", "+a +b" and "a +b" may return different > >> > scores > >> > on the same documents. > >> > > >> > I can undo the change if you think this is a blocker, but that would > be > >> > disappointing as it would mean that we cannot do other exciting > changes > >> > like > >> > flattening nested disjunctions since it would cause the same problem. > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
