How to improve performance of large numbers of successive searches?

Chris McGee Thu, 10 Apr 2008 11:35:40 -0700

Hello,

I am building fairly large directories (200-500 MB of disk space) using 
lucene-java. Sometimes it can take upwards of 10-15 mins to create the 
documents and write them to disk using my current configuration. I have 
upgraded to the latest 2.3.1 version and followed many of the 
recommendations offered on the wiki:


http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

These tips have significantly improved the time to build the directory and 
search it. However, I have noticed that when I perform term queries using 
a searcher many times in rapid succession and iterate over all of the hits 
it can take a significant time. To perform 1000 term query searches each 
with around 2000 hits it takes well over a minute. The time seems to vary 
linearly based on the number of searches (ie. 10 times more searches take 
10 times longer). I tried combining the searches into a BooleanQuery but 
it only shaves off a small percentage (5-10%) of the total time.

I was wondering if there is a faster way to retrieve all of the results 
for my large collections of terms without using more memory and without 
taking more time to build the directory? I already looked at bypassing the 
searcher and using the IndexReader.termDocs() method directly to retrieve 
the documents but there did not seem to be much performance improvement. 
In the majority of my cases I am simplying looking for a large number of 
values to the same field. Also, I'm not interested in scoring results 
based on frequency or weights I need to retrieve all of the results 
anyway.

Any help with this would be great.

Thanks,
Chris McGee

How to improve performance of large numbers of successive searches?

Reply via email to