Re: Optimizing Filters

Ian Lea Thu, 17 Oct 2013 02:07:15 -0700

Yes, I think you should have a play. But on an index that is as
realistic as you can make it - there may be variations in performance
of the different queries and filters depending on term frequencies and
loads of other stuff I don't understand.  General point being simply
that YMMV.



--
Ian.


On Wed, Oct 16, 2013 at 3:07 PM, James Clarke <jcla...@basistech.com> wrote:
> Filters are created programmatically per request (and customized for the
> request) thus in order to benefit from CachingWrapperFilter we require a
> mechanism for looking up CachingWrapperFilters based on the request. But this 
> is
> certainly an area worth trying (we could probably reuse each filter 10 times,
> because of the variation in requests and NRT search).
>
> I was hoping to improve query latency by reformulating the filters and
> queries. However my intuition of the best practice for filter and query
> construction is lacking i.e., is it better to use a TermsFilter and
> MatchAllDocsQuery or a BooleanQuery of TermQuerys, or a BooleanQuery of
> ConstantScoreQuerys of TermQuery etc.
>
> Maybe I should just hunker down and create a synthetic index and try many
> different combinations of filter/query construction.
>
> On Oct 11, 2013, at 7:33 AM, Ian Lea <ian....@gmail.com> wrote:
>
>> Are you going to be caching and reusing the filters e.g. by
>> CachingWrapperFilter?  The main benefit of filters is in reuse.  It
>> takes time to build them in the first place, likely roughly equivalent
>> to running the underlying query although with variations as you
>> describe.  Or are you saying that querying with filters is slow?
>>
>>
>> --
>> Ian.
>>
>>
>> On Thu, Oct 10, 2013 at 7:01 PM, James Clarke <jcla...@basistech.com> wrote:
>>> Are there any best practices for constructing Filters to search efficiently?
>>> From my non-exhaustive experiments I cannot intuit how to construct my 
>>> filters
>>> to achieve best performance.
>>>
>>> I have an index (Lucene 4.3) of about 1.8M documents which contain a field
>>> acting as a flag (evidence:true). Initially all the documents I am 
>>> interested in
>>> searching have this field. Later as the index grows some documents will not 
>>> have
>>> this field.
>>>
>>> In the simplest case I want to filter on documents with evidence:true. 
>>> Running a
>>> couple of hundred queries sequentially and recording how long it takes to
>>> complete.
>>>
>>> * No filter: ~40s
>>> * QueryWrapperFilter(TermQuery(evidence:true)): ~80s
>>> * FieldValueFilter(evidence): ~43s
>>> * TermsFilter(evidence:true): ~50s
>>>
>>> This suggests QWF is a bad idea.
>>>
>>> A more complex filter is:
>>>
>>>  (evidence:true AND (cid:x OR cid:y ...) AND language:eng)
>>>
>>> Where 1.8M documents evidence:true, 2-4 documents per cid clause, 1-60 cid
>>> clauses, and 1.4M documents lang:eng.
>>>
>>> Our initial implementation uses QWF of a BooleanQuery(TQ AND BQ(OR) AND TQ)
>>> which takes ~210s.
>>>
>>> Adjusting this to be a BooleanFilter(TermsFilter AND TermsFilter AND
>>> TermsFilter) sees things slow down to ~239s!
>>>
>>> Any advice on optimizing these filters would be appreciated!
>>>
>>> James
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Optimizing Filters

Reply via email to