Thanks for the tips. I'm still using version 2.4 so I can't use MultiTermQueryWrapperFilter but I'll definitely try to re-group the the terms that are not changing in order to cache them. How can I join several such filters together?
Using FieldCacheTermsFilter sounds promising. Fortunately it is a single value field (our unique doc id). I'll consider very seriously moving to 2.9.1 in order to try it out and see if I can get so real gain from using it or maybe using TermsFilter from contrib. On Sun, Nov 22, 2009 at 6:10 PM, Uwe Schindler <u...@thetaphi.de> wrote: > Maybe this helps you, but read the docs, it will work only with > single-value-fields: > > http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/FieldC > acheTermsFilter.html<http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/FieldC%0AacheTermsFilter.html> > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -----Original Message----- > > From: Eran Sevi [mailto:erans...@gmail.com] > > Sent: Sunday, November 22, 2009 3:49 PM > > To: java-user@lucene.apache.org > > Subject: Efficient filtering advise > > > > Hi, > > > > I have a need to filter my queries using a rather large subset of terms > > (can > > be 10K or even 50K). > > All these terms are sure to exist in the index so the number of results > > can > > be about the same number of terms in the filter. > > The terms are numbers but are not subsequent and are from a large set of > > possible values (so range queries are probably not good for me). > > The index itself is about 1M docs and running even a simple query with > > such > > a large filter takes a lot of time even if the number of results is only > a > > few hundred docs. > > It seems like the speed is affected by the length of the filter even if > > the > > number of results remains more or less the same, which is logical but not > > by > > such a large loss of performance as I'm experiencing (running the query > > with > > a 10K terms filter takes an average of 1s 187ms with 600 results while > > running it with a 50K terms filter takes an average of 5s 207ms with 1000 > > results). > > > > Currently I'm using a QueryFilter with a boolean query in which I "OR" > the > > different terms together. > > I also can't use a cached filter efficiently since the terms to filter on > > change almost every query. > > > > I was wondering if there's a better way to filter my queries so they > won't > > take a few seconds to run? > > > > Thanks in advance for any advise, > > Eran. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >