[
https://issues.apache.org/jira/browse/LUCENE-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736252#comment-16736252
]
Armin Braun commented on LUCENE-8525:
-------------------------------------
[~rcmuir] I looked into this again on the ES side and I don't think we
can/should blanket interpret all `IOException`s as corruptions in the caller.
There simply are a number of possible `IOException` like e.g. "too many open
files" exceptions that certainly definitely aren't corruption.
So if Lucene doesn't throw a different exception type for corruption cases, the
caller will either have to maintain a positive list of `IOException`s that will
not be interpreted as corruption (which seems impossible since the messages on
these could be system dependent) or start maintainling logic to interpret known
IOExceptions that Lucene will throw on corrupt data like `EOFException` (which
ES already does handle specificially
[https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/store/Store.java#L201]
... with the TODO basically asking for what this issue is asking for) or in
case of the linked
[https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/store/Store.java#L201]
we'd have to start looking into the exception message even to interpret it
correctly.
I don't think having to interpret exception messages in the caller is in any
way correct (and extremely hard to maintain since we'd now have to be on the
lookout for changes to these messages). On the other hand, it seems like it
would be fairly straighforward replacing these plain `IOException` `throws` and
wrapping the `EOFException` in Lucene with a more specific exception type? Any
good reason not to change this in Lucene?
> throw more specific exception on data corruption
> ------------------------------------------------
>
> Key: LUCENE-8525
> URL: https://issues.apache.org/jira/browse/LUCENE-8525
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Vladimir Dolzhenko
> Priority: Major
>
> DataInput throws generic IOException if data looks odd
> [DataInput:141|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/DataInput.java#L141]
> there are other examples like
> [BufferedIndexInput:219|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/BufferedIndexInput.java#L219],
>
> [CompressionMode:226|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressionMode.java#L226]
> and maybe
> [DocIdsWriter:81|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java#L81]
> That leads to some difficulties - see [elasticsearch
> #34322|https://github.com/elastic/elasticsearch/issues/34322]
> It would be better if it throws more specific exception.
> As a consequence
> [SegmentInfos.readCommit|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L281]
> violates its own contract
> {code:java}
> /**
> * @throws CorruptIndexException if the index is corrupt
> * @throws IOException if there is a low-level IO error
> */
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]