[
https://issues.apache.org/jira/browse/LUCENE-6894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Elschot updated LUCENE-6894:
---------------------------------
Attachment: LUCENE-6894.patch
Patch of 11 Nov 2015.
Most of the changes are to pass numDocs down to where it is actually used:
ConjunctionDISI, DisjunctionDISIApproximation, DisjunctionScorer,
ConjunctionSpans, SpanOrQuery.
This is incomplete, there no tests.
MinShouldMatchSumScorer only has the disjunctions done.
For un/ordered NearSpans there is a division by 4 (unordered) and by 8
(ordered) for zero allowed slop, something like this should also be done for
the PhraseQueries.
SpanContaining and SpanWithin use the conjunction estimation, these should also
be smaller.
> Improve DISI.cost() by assuming independence for match probabilities
> --------------------------------------------------------------------
>
> Key: LUCENE-6894
> URL: https://issues.apache.org/jira/browse/LUCENE-6894
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Reporter: Paul Elschot
> Priority: Minor
> Attachments: LUCENE-6894.patch
>
>
> The DocIdSetIterator.cost() method returns an estimation of the number of
> matching docs. Currently conjunctions use the minimum cost, and disjunctions
> use the sum of the costs, and both are too high.
> The probability of a match is estimated by dividing available cost() by the
> number of docs in a segment.
> The conjunction probability is then the product of the inputs, and the
> disjunction probability follows from De Morgan's rule:
> "not (A and B)" is the same as "(not A) or (not B)"
> with the probability for "not" computed as 1 minus the input probability.
> The independence that is assumed is normally not there. However, for cost()
> computations only an ordering of the input DISIs/Scorers is needed, and for
> that I expect this assumption to work nicely.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]