Re: expensive post filtering of a query's result

Andreas Brandl Tue, 26 Nov 2013 05:55:14 -0800

Uwe,

> Lucene Filters are always executed before on the full index. This is
> done inside getDocIdSet(), which is similar to scorer() in Querys.
> Most filters return a bitset in this method, so they calculate the
> whole bitset on the full index - this is what your filter is doing.
> The strategy only applies to the consuming of the DocIdSetIterator
> and how it is interleaved with the query's scorer (which is also a
> DocIdSetIterator). Bitset based filters are already built, so there
> is no real speed difference. For leap-frog there is room to improve,
> as the approach advances the iterators until they stop on the same
> document. The only chance to do some type of "postfiltering" is
> using a leap-frog approach and implementing
> DocIdSetIterator.advance() in an efficient way. For
> FixedBitSet/OpenBitSet there is nothing you can do, as the matching
> bits were calculated before. FYI, while calling getDocIdSet(),
> Lucene only provides acceptDocs, if there are deleted documents in
> the index or another filter was executed before.


thank you for the insights! I was clearly heading down the wrong road.

> To do post-filtering (e.g., Solr allows this), you have to do this in
> the result Collector implementation. The collector is used for
> result collection and gets every document id that matches the query+
> filters. In collector's collect(int docId) you can do the expensive
> post filtering, as collect() is only called for matching documents.
> A way to do this is to write a wrapper filter around the final
> collector and only delegating those collect() calls that match your
> post-filter.

Thanks again, that worked very well.

Can you recommend a good resource that covers this level of detail of lucene? I 
know of 'Lucene in Action' but I'm afraid it's a bit outdated meanwhile, isn't 
it?

Regards,
Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: expensive post filtering of a query's result

Reply via email to