[
https://issues.apache.org/jira/browse/LUCENE-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071686#comment-14071686
]
Robert Muir commented on LUCENE-5842:
-------------------------------------
By the way, as a followup, we can do even better and iterate a bit more:
Today each file by itself can be 'correct' but you still have a corrupt index
because the files are mismatched somehow (network replication, or some other
bug).
it might be worth thinking about reviving segmentinfo.attributes (thats
cleanest i think), or put in files map directly (would be harder as it enforces
files have checksums). We could store each files checksum there, and when we
retrieve it here, validate against that attribute. This would detect
mismatching.
Ideally though we'd do this for the commit too (for deletes and dv updates).
Anyway just something to explore on another issue if we can do it without
creating a mess. I don't like how we cant detect such mismatching today (except
via very rudimentary checks like livedocs.length = maxdoc etc).
> Validate checksum footers for postings lists, docvalues, storedfields,
> termvectors on init
> ------------------------------------------------------------------------------------------
>
> Key: LUCENE-5842
> URL: https://issues.apache.org/jira/browse/LUCENE-5842
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Robert Muir
> Attachments: LUCENE-5842.patch
>
>
> For small files (e.g. where we read in all the bytes anyway), we currently
> validate the checksum on reader init.
> But for larger files like .doc/.frq/.pos/.dvd/.fdt/.tvd we currently do
> nothing at all on init, as it would be too expensive.
> We should at least do this:
> {code}
> // NOTE: data file is too costly to verify checksum against all the bytes on
> // open, but for now we at least verify proper structure of the checksum
> // footer: which looks for FOOTER_MAGIC + algorithmID. This is cheap
> // and can detect some forms of corruption such as file truncation.
> CodecUtil.retrieveChecksum(data);
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]