disjunction query is much slower than conjuction query. That's why many search engine use conjuction as default. by the way, you say you have 5,000,000 documents. how many documents match your query? do you need sort by relevant score or just want to match and don't care sort? if you don't care sort, you may try to use filter e.g. Query allDocsQuery=parser.parse("*:*); TermsFilter cityFilter = new TermsFilter(); for (String term : terms) { cityFilter.addTerm(new Term("city",id)); } searcher.search(allDocsQuery,cityFilter);
I am not sure this method is faster than boolean or query. in theory, BooleanScorer is TAAT method(traverse each term in a 2k window). BooleanScorer2 is DAAT algorithm. BooleanScorer is faster than BooleanScorer2 but it can't support required queries and exlusive queries and term count is less than 32(because it use a 32 bit integer to remember which term hit). TermsFilter is similar to BooleanScorer, it traverse all terms and use a bitset to mask hited documents. if your matched document number is very large, it may be faster than BooleanScorer2. On Tue, May 8, 2012 at 6:54 PM, 齐保元 <qibaoy...@126.com> wrote: > Thanks for you reply,firstly. So many or query is to monitor the > term.One scene is that:if i want to know cities of a province and events that > happens, I may instantiate the query like "(California or NewYork or > SanFransico.... or SomePlace) and (Pollution or Criminal ... or Alcohol)".So, > the long query happens...I hope i have describe the question > clearly.---------------- > At 2012-05-08 18:44:13,"Li Li" <fancye...@gmail.com> wrote: >>a disjunction (or) query of so many terms is indeed slow. >>can u describe your real problem? why you should the disjunction >>results of so many terms? >> >> >> >>On Sun, May 6, 2012 at 9:57 PM, qibaoy...@126.com <qibaoy...@126.com> wrote: >>> Hi, >>> I met a problem about how to search many keywords in about 5,000,000 >>> documents.For example the query may be like "(a1 or a2 or a3 ....a200) and >>> (b1 or b2 or b3 or b4 ..... b400)",I found it will take vey long >>> time(40seconds) to get the the answer in only one field(Title field),and >>> JVM will throw OutMemory error in more fields(title field plus content >>> field).Any suggestions or good idea to solve this problem?thanks in advance. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >>--------------------------------------------------------------------- >>To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>For additional commands, e-mail: java-user-h...@lucene.apache.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org