[ 
https://issues.apache.org/jira/browse/LUCENE-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736252#comment-16736252
 ] 

Armin Braun commented on LUCENE-8525:
-------------------------------------

[~rcmuir] I looked into this again on the ES side and I don't think we 
can/should blanket interpret all `IOException`s as corruptions in the caller.
There simply are a number of possible `IOException` like e.g. "too many open 
files" exceptions that certainly definitely aren't corruption.
So if Lucene doesn't throw a different exception type for corruption cases, the 
caller will either have to maintain a positive list of `IOException`s that will 
not be interpreted as corruption (which seems impossible since the messages on 
these could be system dependent) or start maintainling logic to interpret known 
IOExceptions  that Lucene will throw on corrupt data like `EOFException` (which 
ES already does handle specificially 
[https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/store/Store.java#L201]
 ... with the TODO basically asking for what this issue is asking for) or in 
case of the linked 
[https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/store/Store.java#L201]
 we'd have to start looking into the exception message even to interpret it 
correctly.

I don't think having to interpret exception messages in the caller is in any 
way correct (and extremely hard to maintain since we'd now have to be on the 
lookout for changes to these messages). On the other hand,  it seems like it 
would be fairly straighforward replacing these plain `IOException` `throws` and 
wrapping the `EOFException` in Lucene with a more specific exception type? Any 
good reason not to change this in Lucene?

> throw more specific exception on data corruption
> ------------------------------------------------
>
>                 Key: LUCENE-8525
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8525
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Vladimir Dolzhenko
>            Priority: Major
>
> DataInput throws generic IOException if data looks odd
> [DataInput:141|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/DataInput.java#L141]
> there are other examples like 
> [BufferedIndexInput:219|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/BufferedIndexInput.java#L219],
>  
> [CompressionMode:226|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressionMode.java#L226]
>  and maybe 
> [DocIdsWriter:81|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java#L81]
> That leads to some difficulties - see [elasticsearch 
> #34322|https://github.com/elastic/elasticsearch/issues/34322]
> It would be better if it throws more specific exception.
> As a consequence 
> [SegmentInfos.readCommit|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L281]
>  violates its own contract
> {code:java}
> /**
>    * @throws CorruptIndexException if the index is corrupt
>    * @throws IOException if there is a low-level IO error
>    */
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to