Filters are created programmatically per request (and customized for the request) thus in order to benefit from CachingWrapperFilter we require a mechanism for looking up CachingWrapperFilters based on the request. But this is certainly an area worth trying (we could probably reuse each filter 10 times, because of the variation in requests and NRT search).
I was hoping to improve query latency by reformulating the filters and queries. However my intuition of the best practice for filter and query construction is lacking i.e., is it better to use a TermsFilter and MatchAllDocsQuery or a BooleanQuery of TermQuerys, or a BooleanQuery of ConstantScoreQuerys of TermQuery etc. Maybe I should just hunker down and create a synthetic index and try many different combinations of filter/query construction. On Oct 11, 2013, at 7:33 AM, Ian Lea <ian....@gmail.com> wrote: > Are you going to be caching and reusing the filters e.g. by > CachingWrapperFilter? The main benefit of filters is in reuse. It > takes time to build them in the first place, likely roughly equivalent > to running the underlying query although with variations as you > describe. Or are you saying that querying with filters is slow? > > > -- > Ian. > > > On Thu, Oct 10, 2013 at 7:01 PM, James Clarke <jcla...@basistech.com> wrote: >> Are there any best practices for constructing Filters to search efficiently? >> From my non-exhaustive experiments I cannot intuit how to construct my >> filters >> to achieve best performance. >> >> I have an index (Lucene 4.3) of about 1.8M documents which contain a field >> acting as a flag (evidence:true). Initially all the documents I am >> interested in >> searching have this field. Later as the index grows some documents will not >> have >> this field. >> >> In the simplest case I want to filter on documents with evidence:true. >> Running a >> couple of hundred queries sequentially and recording how long it takes to >> complete. >> >> * No filter: ~40s >> * QueryWrapperFilter(TermQuery(evidence:true)): ~80s >> * FieldValueFilter(evidence): ~43s >> * TermsFilter(evidence:true): ~50s >> >> This suggests QWF is a bad idea. >> >> A more complex filter is: >> >> (evidence:true AND (cid:x OR cid:y ...) AND language:eng) >> >> Where 1.8M documents evidence:true, 2-4 documents per cid clause, 1-60 cid >> clauses, and 1.4M documents lang:eng. >> >> Our initial implementation uses QWF of a BooleanQuery(TQ AND BQ(OR) AND TQ) >> which takes ~210s. >> >> Adjusting this to be a BooleanFilter(TermsFilter AND TermsFilter AND >> TermsFilter) sees things slow down to ~239s! >> >> Any advice on optimizing these filters would be appreciated! >> >> James >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org