On Sun, Aug 8, 2010 at 11:13 AM, Paul Elschot <[email protected]> wrote:
> Op zondag 08 augustus 2010 16:04:54 schreef Michael McCandless:
>> I noticed that the SubScorer in BooleanScorer is able to handle
>> "required" clauses, and spends some CPU confirming each hit matches
>> the required clauses.
>>
>> Yet, BooleanQuery will never do so (it always uses BooleanScorer2 if
>> there are any required clauses).
>>
>> And, if I assert !required in BooleanScorer, all tests pass... so it
>> really looks to be unused code.
>>
>> Does anyone know the history here?
>
> BooleanScorer2 was introduced to use advance() (former skipTo())
> when not all subscorers are required.
> Iirc when skipTo() was introduced it was initially only used by
> ConjunctionScorer (all required, AND type query) and PhraseScorer.
> BooleanScorer works nicely when some sub-scorers are required
> but it neither uses nor provides advance(), so it should always be
> used with some care. And when no particular sub-scorer is required
> (OR type query) BooleanScorer is the fastest one around, but it can
> score docs out of order.

OK, but it looks like right now we never pass required clauses to BooleanScorer.

>> Did we used to have BooleanScorer
>> handle certain BQ's with required clauses?
>
> Before skipTo() all such BQ's were handled by BooleanScorer.

OK.

>> (It seems likely it could
>> give better performance in many cases, eg when the freq of the 2
>> sub-queries are comparable).
>
> Do you mean when the 2 sub-queries have many docs in common?
> In that case an AND is almost equivalent to an OR, so
> BooleanScorer could indeed be faster than BooleanScorer2.

I meant when the two clauses have similar freqs ("typically" this will
mean they have many docs in common, but not necessarily).

.advance has fairly high cost, so, it really should only be used when
it's expected to save a good number of .next's.

My guess is we should re-activate BS for scoring required clauses, in
certain cases.  It will likely be much faster... the problem is it's
hard to figure out what those cases are because our Scorers can't
estimate their rough hit counts.  Maybe we should add an
"estimatedCount" or something to Scorer... but maybe until then, we
should comment out the BS code that tries to handle required clauses.

BS2 will be faster in highly lopsided cases (AND of a rare term w/ a
common one).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to