Re: Document ID shuffling under 2.3.x (on merge?)

Daniel Noll Wed, 12 Mar 2008 16:47:56 -0700

On Wednesday 12 March 2008 19:36:57 Michael McCandless wrote:
> OK, I think very likely this is the issue: when IndexWriter hits an
> exception while processing a document, the portion of the document
> already indexed is left in the index, and then its docID is marked
> for deletion.  You can see these deletions in your infoStream:
>
>    flush 0 buffered deleted terms and 30 deleted docIDs on 20 segments
>
> This means you have deletions in your index, by docID, and so when
> you optimize the docIDs are then compacted.


Aha.  Under 2.2, a failure would result in nothing being added to the text 
index so this would explain the problem.  It would also explain why smaller 
data sets are less likely to cause the problem (it's less likely for there to 
be an error in it.)

Workarounds?
  - flush() after any IOException from addDocument()  (overhead?)
  - use ++ to determine the next document ID instead of
    index.getWriter().docCount()  (out of sync after an error but fixes itself
    on optimize().
  - Use a field for a separate ID (slower later when reading the index)
  - ???

Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Document ID shuffling under 2.3.x (on merge?)

Reply via email to