Lucene reorganizing indexes

Scott Smith Mon, 16 Jul 2012 13:29:59 -0700

We have an application that has to do "real time" indexing of a number of 
documents.  What it does is wake up about every 20 seconds and updates the 
index with any changes that have been queued since the last time it ran.  This 
involves adding and deleting several hundred documents.  This is all done in a 
single thread.  There can be multiple threads doing searches simultaneous with 
the update thread (the searches run in a different process).


Back in the days of 1.42, we would force an index optimization once each day.  
However, my impression is that the later versions of Lucene (we are currently 
using 3.5), Lucene will often do its own reorganization based on hitting 
certain criteria.  I've been told that optimizing the index is, perhaps, no 
longer necessary.  Can someone describe what happens here?

The reason I'm asking about this is that we see our application periodically 
using excessive amounts of kernel time (on Windows) which normally indicates a 
lot of disk activity.  We are unable to align this with anything our code is 
doing.  Obviously, we expect Lucene to be causing disk activity, it just seems 
that the last release (we were at 3.02 before going to 3.5) severely increased 
the disk activity which is interfering with other things running on the boxes.

Does any of this make sense to anyone?  Is there an explanation?  Thoughts 
about what we might do about it?

Thanks in advance.

Scott

Lucene reorganizing indexes

Reply via email to