J.J. Larrea wrote:
So... I notice that both IndexWriter.addIndexes(...) merge methods start and end with calls to optimize() on the target index. I'm not sure whether that is causing the unpacking and repacking I observe, but it does wonder whether they truly need to be there:

I don't recall exactly why this was done. (I should have written a comment!)

I think the concerns in addIndexes are that, before the segments file is written: 1. The segments must be sorted by size, with small segments on top, in order for future incremental merging to work correctly. 2. Segment names must be unique and less than the segment counter, so that they will not conflict with future segment names.
3. All segments must be stored in the same directory.

Optimizing before and after was a cheap way to ensure these, although this still does not explain why the first optimize is required, only the last. I'm sure there was a reason, but I'm no longer sure that it is valid.

Note that the two addIndexes methods currently use different algorithms. I think the addIndexes(Directory[]) uses a merge algorithm that observes mergeFactor, while addIndexes(IndexReader[]) does not, since all of the indexes are already open.

An improved algorithm for addIndexes(Directory[]) might be to:

0. Check that none of the Directories are the same as this directory.
1. Don't optimize first;
2. Run the existing algorithm, which combines the added segments until they are fewer than mergeFactor. 3. (new) Merge any segments that are not in this directory. This will require first moving them to the top of the stack.
4. Re-sort the stack by size.
5. Don't optimize at end.

I think that should do it.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to