Yes, I think you should have a play. But on an index that is as realistic as you can make it - there may be variations in performance of the different queries and filters depending on term frequencies and loads of other stuff I don't understand. General point being simply that YMMV.
-- Ian. On Wed, Oct 16, 2013 at 3:07 PM, James Clarke <jcla...@basistech.com> wrote: > Filters are created programmatically per request (and customized for the > request) thus in order to benefit from CachingWrapperFilter we require a > mechanism for looking up CachingWrapperFilters based on the request. But this > is > certainly an area worth trying (we could probably reuse each filter 10 times, > because of the variation in requests and NRT search). > > I was hoping to improve query latency by reformulating the filters and > queries. However my intuition of the best practice for filter and query > construction is lacking i.e., is it better to use a TermsFilter and > MatchAllDocsQuery or a BooleanQuery of TermQuerys, or a BooleanQuery of > ConstantScoreQuerys of TermQuery etc. > > Maybe I should just hunker down and create a synthetic index and try many > different combinations of filter/query construction. > > On Oct 11, 2013, at 7:33 AM, Ian Lea <ian....@gmail.com> wrote: > >> Are you going to be caching and reusing the filters e.g. by >> CachingWrapperFilter? The main benefit of filters is in reuse. It >> takes time to build them in the first place, likely roughly equivalent >> to running the underlying query although with variations as you >> describe. Or are you saying that querying with filters is slow? >> >> >> -- >> Ian. >> >> >> On Thu, Oct 10, 2013 at 7:01 PM, James Clarke <jcla...@basistech.com> wrote: >>> Are there any best practices for constructing Filters to search efficiently? >>> From my non-exhaustive experiments I cannot intuit how to construct my >>> filters >>> to achieve best performance. >>> >>> I have an index (Lucene 4.3) of about 1.8M documents which contain a field >>> acting as a flag (evidence:true). Initially all the documents I am >>> interested in >>> searching have this field. Later as the index grows some documents will not >>> have >>> this field. >>> >>> In the simplest case I want to filter on documents with evidence:true. >>> Running a >>> couple of hundred queries sequentially and recording how long it takes to >>> complete. >>> >>> * No filter: ~40s >>> * QueryWrapperFilter(TermQuery(evidence:true)): ~80s >>> * FieldValueFilter(evidence): ~43s >>> * TermsFilter(evidence:true): ~50s >>> >>> This suggests QWF is a bad idea. >>> >>> A more complex filter is: >>> >>> (evidence:true AND (cid:x OR cid:y ...) AND language:eng) >>> >>> Where 1.8M documents evidence:true, 2-4 documents per cid clause, 1-60 cid >>> clauses, and 1.4M documents lang:eng. >>> >>> Our initial implementation uses QWF of a BooleanQuery(TQ AND BQ(OR) AND TQ) >>> which takes ~210s. >>> >>> Adjusting this to be a BooleanFilter(TermsFilter AND TermsFilter AND >>> TermsFilter) sees things slow down to ~239s! >>> >>> Any advice on optimizing these filters would be appreciated! >>> >>> James >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org