RE: Short circuit AND or subquerying in lucene for performance

Uwe Schindler Wed, 15 Feb 2012 13:15:54 -0800

> : Basically for queries such as field1:foo AND field2:*bar, I think it
> : would be highly beneficial to restrict evaluation of the second field on
> : the result of the first to avoid scanning the index in its entirety due
> : to the leading wildcard.
> 
> This is exactly how the BooleanQuery class in Lucene works.
> 
> Please note the logic in ConjunctionScorer and BooleanScorer2 (how much
> optimizing can be done depends on wether all of the clauses are required
or
> not)


The problem here is more the leading wildcard query. The terms are scanned
before the scoring/result collection occurs (partly during query rewrite,
partly as bitset before the scorer starts - depends on term density). The
problem is that short circuiting in BS2 occurs when the wild card bitsets
are already calculated... For wildcard queries there is no possibility to
optimize the document collection, because *every* matching term has to be
scanned and termdocs retrieved.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Short circuit AND or subquerying in lucene for performance

Reply via email to