Uwe, > Lucene Filters are always executed before on the full index. This is > done inside getDocIdSet(), which is similar to scorer() in Querys. > Most filters return a bitset in this method, so they calculate the > whole bitset on the full index - this is what your filter is doing. > The strategy only applies to the consuming of the DocIdSetIterator > and how it is interleaved with the query's scorer (which is also a > DocIdSetIterator). Bitset based filters are already built, so there > is no real speed difference. For leap-frog there is room to improve, > as the approach advances the iterators until they stop on the same > document. The only chance to do some type of "postfiltering" is > using a leap-frog approach and implementing > DocIdSetIterator.advance() in an efficient way. For > FixedBitSet/OpenBitSet there is nothing you can do, as the matching > bits were calculated before. FYI, while calling getDocIdSet(), > Lucene only provides acceptDocs, if there are deleted documents in > the index or another filter was executed before.
thank you for the insights! I was clearly heading down the wrong road. > To do post-filtering (e.g., Solr allows this), you have to do this in > the result Collector implementation. The collector is used for > result collection and gets every document id that matches the query+ > filters. In collector's collect(int docId) you can do the expensive > post filtering, as collect() is only called for matching documents. > A way to do this is to write a wrapper filter around the final > collector and only delegating those collect() calls that match your > post-filter. Thanks again, that worked very well. Can you recommend a good resource that covers this level of detail of lucene? I know of 'Lucene in Action' but I'm afraid it's a bit outdated meanwhile, isn't it? Regards, Andreas --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org