Hi all

I'm currently benchmarking Lucene to get an understanding of what optimisations are available for long queries, and wanted to check what the recommended approach is.

Unsurprisingly a naive approach to long queries (just keep adding SHOULD clauses to a big BooleanQuery) scales close to linearly in the number of terms, which beyond a certain point isn't good enough.

The obvious solution is to prune the query in order to reduce the number of documents which need scoring, and this is easy to do, but has the downside that none of the pruned terms are used for scoring.

In Xapian there's a handy query operator called OP_AND_MAYBE, where only terms on the left-hand-side are used to select documents, with terms on the right-hand-side used for scoring only. This performs much better for long queries if less discriminative terms are moved onto the right-hand-side.

I tried to replicate this approach in Lucene using the following query (in QueryParser syntax):

+(some mandatory terms) and some other terms for scoring only

The presence of a MUST clause in the outer BooleanQuery forces the remaining SHOULD clauses to be purely optional and not expand the set of documents scored, so this has the right semantics. However the performance benefit isn't there -- in a test with 200 query terms in total, it quickly becomes slower than a plain flat BooleanQuery once the number of terms in the mandatory part of the query exceeds 5 or so.

Interestingly it's much much faster (~40ms) when there's only one mandatory term, than when there are two terms in the mandatory clause (~2500ms), which leads me to suspect an obvious optimisation is being missed.

Anyone have any ideas on this, pointers to other relevant query types or optimisations available in Lucene 4, or on which parts of the Query/Weight/Scorer code we'd need to change to speed up this kind of thing?

Cheers
-Matt

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to