On Wednesday 12 March 2008 19:36:57 Michael McCandless wrote: > OK, I think very likely this is the issue: when IndexWriter hits an > exception while processing a document, the portion of the document > already indexed is left in the index, and then its docID is marked > for deletion. You can see these deletions in your infoStream: > > flush 0 buffered deleted terms and 30 deleted docIDs on 20 segments > > This means you have deletions in your index, by docID, and so when > you optimize the docIDs are then compacted.
Aha. Under 2.2, a failure would result in nothing being added to the text index so this would explain the problem. It would also explain why smaller data sets are less likely to cause the problem (it's less likely for there to be an error in it.) Workarounds? - flush() after any IOException from addDocument() (overhead?) - use ++ to determine the next document ID instead of index.getWriter().docCount() (out of sync after an error but fixes itself on optimize(). - Use a field for a separate ID (slower later when reading the index) - ??? Daniel --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]