Op zondag 22 november 2009 17:23:53 schreef Eran Sevi: > Thanks for the tips. > > I'm still using version 2.4 so I can't use MultiTermQueryWrapperFilter but > I'll definitely try to re-group the the terms that are not changing in order > to cache them. > How can I join several such filters together?
There are various ways. OpenBitSet and OpenBitSetDISI can do this, and there's also BooleanFilter and ChainedFilter in contrib. > Using FieldCacheTermsFilter sounds promising. Fortunately it is a single > value field (our unique doc id). Regards, Paul Elschot > I'll consider very seriously moving to 2.9.1 in order to try it out and see > if I can get so real gain from using it or maybe using TermsFilter from > contrib. > > > On Sun, Nov 22, 2009 at 6:10 PM, Uwe Schindler <u...@thetaphi.de> wrote: > > > Maybe this helps you, but read the docs, it will work only with > > single-value-fields: > > > > http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/FieldC > > acheTermsFilter.html<http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/FieldC%0AacheTermsFilter.html> > > > > Uwe > > > > ----- > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > > > -----Original Message----- > > > From: Eran Sevi [mailto:erans...@gmail.com] > > > Sent: Sunday, November 22, 2009 3:49 PM > > > To: java-user@lucene.apache.org > > > Subject: Efficient filtering advise > > > > > > Hi, > > > > > > I have a need to filter my queries using a rather large subset of terms > > > (can > > > be 10K or even 50K). > > > All these terms are sure to exist in the index so the number of results > > > can > > > be about the same number of terms in the filter. > > > The terms are numbers but are not subsequent and are from a large set of > > > possible values (so range queries are probably not good for me). > > > The index itself is about 1M docs and running even a simple query with > > > such > > > a large filter takes a lot of time even if the number of results is only > > a > > > few hundred docs. > > > It seems like the speed is affected by the length of the filter even if > > > the > > > number of results remains more or less the same, which is logical but not > > > by > > > such a large loss of performance as I'm experiencing (running the query > > > with > > > a 10K terms filter takes an average of 1s 187ms with 600 results while > > > running it with a 50K terms filter takes an average of 5s 207ms with 1000 > > > results). > > > > > > Currently I'm using a QueryFilter with a boolean query in which I "OR" > > the > > > different terms together. > > > I also can't use a cached filter efficiently since the terms to filter on > > > change almost every query. > > > > > > I was wondering if there's a better way to filter my queries so they > > won't > > > take a few seconds to run? > > > > > > Thanks in advance for any advise, > > > Eran. > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org