How about an attribute (fullyIndexed=true/false) to keep track of whether the 
indexing was successful?

We used a similar attribute for a similar problem, but stored it in the 
accompanying database instead.

-h




----- Original Message ----
From: Michael McCandless <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Monday, May 19, 2008 4:02:52 AM
Subject: Re: Transaction semantics in Document addition


Dino Korah wrote:
> Hi All,
>
> I am dealing with a situation where a document could possibly have  
> multiple
> attachments to it, and they are all added to the index under a  
> document-id
> (not lucene doc-id). Now if one of the attachments fail to get  
> indexed due
> to failure of any subsystem like the text extraction module, I need  
> to abort
> the complete set of documents under the document id. I add individual
> attachments as a seperate documents in lucene.
>
> Whats the best mechanism to accomplish this.
>
> One possible solution I have in mind is to create a RAMDirectory  
> for each
> document with attachems to it and add everything to the  
> RAMDirectory. Once
> we are satisfied with the indexing, merge it into the main index.
>
> Would that work?

That will work but performance may not be great.

You could also open the IndexWriter with autoCommit=false, and then  
if you hit a failure in the text extraction, call abort() to cancel  
all docs you had added to this IndexWriter?  But that's bad because  
you lose all the documents that had succeeded.

Or, on hitting a failure in one of a document's attachments, you  
could just call deleteDocument on those attachments already added, to  
"undo" them?  This is in fact how IndexWriter itself deals with an  
exception while adding a document (to avoid having only a part of the  
document in the index): if it his an exception after processing some  
number of tokens, it just marks that document as deleted.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to