[
https://issues.apache.org/jira/browse/LUCENE-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736802#comment-16736802
]
Armin Braun commented on LUCENE-8525:
-------------------------------------
{quote}If the bits were wrong, we don't know why. Maybe its just a hardware
memory issue, maybe hotspot compiled the code wrong, maybe its a bug in lucene
code, maybe its something else.
So I think if we hit EOF, the correct thing to do is throw EOFException, thats
as specific as it gets.
{quote}
And that's fine. As I said
[above|https://issues.apache.org/jira/browse/LUCENE-8525?focusedCommentId=16736342&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16736342]
EOF is close and Lucene not handling it is ok. What I'm looking for is simply
some differentiation between "wrong bytes" and "other IO issue". Whether or not
the bytes are temporarily wrong or not isn't really important, it's and
unrecoverable issue either way probably. What is important is that there is a
way to figure out whether Lucene read broken bytes or if it was unable to read
bytes.
EOF is the somewhat ambiguous corner case here as you point out and it's fine
to not handle it in Lucene since I don't see how always interpreting it as
corrupt data is going to have a real error rate ever. But if Lucene throws EOF
when e.g. we fail to read the number of bytes some 'length' field suggested we
can read, but throw plain IOException when some bits don't match what we
expected then that's:
a. somewhat inconsistent
b. really hard to properly handle in the calling code.
isn't it?
> throw more specific exception on data corruption
> ------------------------------------------------
>
> Key: LUCENE-8525
> URL: https://issues.apache.org/jira/browse/LUCENE-8525
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Vladimir Dolzhenko
> Priority: Major
>
> DataInput throws generic IOException if data looks odd
> [DataInput:141|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/DataInput.java#L141]
> there are other examples like
> [BufferedIndexInput:219|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/BufferedIndexInput.java#L219],
>
> [CompressionMode:226|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressionMode.java#L226]
> and maybe
> [DocIdsWriter:81|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java#L81]
> That leads to some difficulties - see [elasticsearch
> #34322|https://github.com/elastic/elasticsearch/issues/34322]
> It would be better if it throws more specific exception.
> As a consequence
> [SegmentInfos.readCommit|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L281]
> violates its own contract
> {code:java}
> /**
> * @throws CorruptIndexException if the index is corrupt
> * @throws IOException if there is a low-level IO error
> */
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]