When you previously saw corruption was it due to an OS or machine
crash (or power cord got pulled)?  If so, you were likely hitting
LUCENE-1044, which is fixed on the trunk version of Lucene (to be 2.4
at some point) but is not fixed in 2.3.

If that is what you were hitting, then unfortunately neither buffering
updates into RAM nor using autoCommit=false in 2.3 will fully protect
you from this issue.  Though, both of these approaches should reduce
your chance of hitting LUCENE-1044 since they both reduce frequency of
commits to the index.

Mike

Simon Wistow wrote:

I currently have a set up that indexes into RAM and then periodically
merges that into a disk based index.

Searches are done from the disk based index and deletes are handled by
keeping a list of deleted documents, filtering out search results and
applying the deletes to the index at merge time.

All this was done to make sure that we didn't corrupt the index (which
we'd seen happen a few times when the indexing machine failed for
whatever reason). With this scheme if the machine fails then all that's lost is the RAM index and the list of deletes. We then just simply play
back all actions since the last merge and we're back to where we
started.

However it occurred to me that this might all be redundant now with
Lucene 2.3 (it's possible it might have always been redundant come to
think of it) - should I just open a Disk based Index with
autocommit=false and then periodically commit the changes by close() ing
and then re-open()ing the Disk index ? Is that atomic? i.e is there a
situation using this whereby the index could become corrupted?

Thanks,

Simon



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to