Merging with IndexWriter.addIndexes(...)

J.J. Larrea Mon, 28 Nov 2005 16:10:14 -0800

My application needs to simultaneously process record additions andupdates with one pass through a database. That's not in itself aproblem: I open an IndexReader on the existing index to mark theprior versions of updated records as deleted Documents, and anIndexWriter on a new empty index to accept the new and updatedrecords as new Documents. Then it's simply a matter of merging thetwo indexes into one.

One possibility is to directly merge from the IndexReader (existingindex) into the IndexWriter (new index). The problem is that ittakes a very long time, on the order of several hours, since it mustreplicate everything in the old index (2.7Gb, 2.2M Documents) in thenew one. Since the update consists of a small number of records,typically under 1000 and sometimes just a handful, this is a waste ofcycles.

Anticipating that, my code reverses the relationship, closing theindex objects at the end of the loop then opening an IndexWriter onthe existing (target) index and an IndexReader on the index which hasthe new records, which gets merged into the target. It functions asexpected and desired, but is also far more time-consuming thanreasonable.

For example, if I take an empty (no Document) or small (100Documents) index and merge one or other via IndexWriter.addIndexes()into an already optimized single-segment index, what should in theorybe a null or trivial operation ends up in practice unpacking thedestination index into many segments, and then repacking them into asingle segment. For that 2.7Gb index it takes over an hour.

In contrast, it only a few seconds to add those same 100 Documentsdirectly to the index via addDocument(), as long as one doesn'toptimize.

So... I notice that both IndexWriter.addIndexes(...) merge methodsstart and end with calls to optimize() on the target index. I'm notsure whether that is causing the unpacking and repacking I observe,but it does wonder whether they truly need to be there:

- Is starting off with zero or 1 segment (per a comment in the sourcecode) truly a precondition for successful merging?

- Since one can open an unoptimized index and add millions ofDocuments (creating potentially hundreds of segments) viaaddDocument, with optimization left entirely optional and up to theuser, why is optimization required as a postcondition for merging?


Any advice on this or the general application design, would be appreciated.

Thanks,
J.J. Larrea

PS: This was tested against SVN trunk revision 329490

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Merging with IndexWriter.addIndexes(...)

Reply via email to