Not sure how exactly understand corrupted indexes in the sense that could
not read / use indexes or something else..
thanks
DT
www.ejinz.com
EjinZ Search Engine
----- Original Message -----
From: "Doron Cohen" <[EMAIL PROTECTED]>
To: <java-user@lucene.apache.org>
Sent: Friday, August 03, 2007 1:03 AM
Subject: Re: How do YOU detect corrupt indexes?
What is the anticipated cause of corruption? Malicious?
Hardware fault? This somewhat reminds of discussions in
the list about encrypting the index. See LUCENE-737
and a discussion pointed by it. One of the opinions
there was that encryption should be handled at a lower
level (OS/FS). Wouldn't that hold here as well?
Joe R wrote:
Hello,
I've been asked to devise some way to discover and correct datain Lucene
indexes that have been "corrupted." The word "corrupt", in
this case, has a
few different meanings, some of which strike me as exceedingly
difficult to
grok. What concerns me are the cases where we don't know that
an index has
been changed: A bit error in a stored field, for instance, is a form of
corruption that we (ideally) should be able to identify, at the
very least, and
hopefully correct. This case in particular seems particularly
onerous, since
this isn't going to throw an exception of any sort, any time.
We have a fairly good handle on how to remedy problems that
throw exceptions,
so we should be alright with corruption where (say) an operator
logs in and
overwrites a file.
I'm wondering how other Lucene users have tackled this problem
in the past.
Calculating checksums on the documents seems like one way to goabout it:
compute a checksum on the document and, in a background thread,
compare the
checksum to the data. Unfortunately we're building a large,
federated system
and it would take months to exhaustively check every document this way.
Checksumming the files themselves might be too much: We're
storing gigabytes of
data per index and there is some churn to the data; in other words, the
overhead for this method might be too high.
Thanks for any help you might have.
-Joseph Rose
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]