We recently upgraded from lucene 2.4.0 to lucene 3.0.2.  Our load testing 
revealed a serious performance drop specific to traversing the list of terms 
and their associated documents for a given indexed field.  Our code looks 
something like this:

for(Term term : terms) {
TermDocs termDocs = indexReader.termDocs(term);
while(termDocs.next()) {   //  much slower here
    int doc = termDocs.doc();
    ...do something with each doc...
}


The slowness is all on the first call to TermDocs.next() for each term.  
Further investigation comparing 2.4.0 and 3.0.2 revealed that there is some new 
synchronization on the SegmentTermDocs constructor and the 
SegmentReader.getTermsReader().  The first call to next() hits this 
synchronization, causing a 4x slowdown on an 8 CPU machine.

My first question is should we be using a different approach to process each 
term's doc list that would be more efficient?  The synchronization appears to 
be on aspects of these classes that the next() operation is not concerned with.

My other question is whether there are planned performance enhancements to 
address this loss of performance?

Thanks.

John


Reply via email to