I had implemented WAND in solr for our own project. It can improve the performance a lot. For your reference: http://dl.acm.org/citation.cfm?id=956944
But it needs to change index a little bit. Thanks, On Tue, Dec 11, 2012 at 6:19 AM, Matthew Willson <matt...@swiftkey.net>wrote: > Hi all > > I'm currently benchmarking Lucene to get an understanding of what > optimisations are available for long queries, and wanted to check what the > recommended approach is. > > Unsurprisingly a naive approach to long queries (just keep adding SHOULD > clauses to a big BooleanQuery) scales close to linearly in the number of > terms, which beyond a certain point isn't good enough. > > The obvious solution is to prune the query in order to reduce the number > of documents which need scoring, and this is easy to do, but has the > downside that none of the pruned terms are used for scoring. > > In Xapian there's a handy query operator called OP_AND_MAYBE, where only > terms on the left-hand-side are used to select documents, with terms on the > right-hand-side used for scoring only. This performs much better for long > queries if less discriminative terms are moved onto the right-hand-side. > > I tried to replicate this approach in Lucene using the following query (in > QueryParser syntax): > > +(some mandatory terms) and some other terms for scoring only > > The presence of a MUST clause in the outer BooleanQuery forces the > remaining SHOULD clauses to be purely optional and not expand the set of > documents scored, so this has the right semantics. However the performance > benefit isn't there -- in a test with 200 query terms in total, it quickly > becomes slower than a plain flat BooleanQuery once the number of terms in > the mandatory part of the query exceeds 5 or so. > > Interestingly it's much much faster (~40ms) when there's only one > mandatory term, than when there are two terms in the mandatory clause > (~2500ms), which leads me to suspect an obvious optimisation is being > missed. > > Anyone have any ideas on this, pointers to other relevant query types or > optimisations available in Lucene 4, or on which parts of the > Query/Weight/Scorer code we'd need to change to speed up this kind of thing? > > Cheers > -Matt > > ------------------------------**------------------------------**--------- > To unsubscribe, e-mail: > java-user-unsubscribe@lucene.**apache.org<java-user-unsubscr...@lucene.apache.org> > For additional commands, e-mail: > java-user-help@lucene.apache.**org<java-user-h...@lucene.apache.org> > >