On Tue, Nov 20, 2012 at 6:18 PM, Trejkaz <trej...@trypticon.org> wrote:
> I have a feature I wanted to implement which required a quick way to > check whether an individual document matched a query or not. > > IndexSearcher.explain seemed to be a good fit for this. > > The query I tested was just a BooleanQuery with two TermQuery inside > it, both with MUST. I ran an empty query to match all documents and > then ran the new code against each document. Within 40,743 documents, > 1,072 documents matched the query. > > I got the times of around 15.5s doing this. After noticing that > ConstantScoreQuery now works with Query in addition to Filter, I > started using it as well, which further reduced this time to 13.6s. > > There is a comment like this on the explain method, though: > > "Computing an explanation is as expensive as executing > the query over the entire index." > > So I wanted to test this. To do this, I made a collector which did > nothing but look for the single item being matched. > > Times for searching the whole index using this collector came to > around 30.9s, which is more than twice as slow as using explain (times > didn't vary at all if I used ConstantScoreQuery here, which I assume > is something to do with using a custom collector which is ignoring the > scorer.) > > So I was wondering, is this comment just out of date? It seems that by > using explain(), I get the same information I get by querying the > whole index, *plus* information about the score which the custom > collector wasn't recording, all in less than half the time it took to > query the whole index. > > Explain is not performant... but the comment is fair I think? Its more of a worst-case, depends on the query. Explain is going to rewrite the query/create the weight and so on just to advance() the scorer to that single doc So if this is e.g. a wildcard query then it could definitely be almost as slow as searching the whole index since the rewrite involves scanning through the term dictionary or whatever.