[
https://issues.apache.org/jira/browse/LUCENE-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736426#comment-16736426
]
Armin Braun commented on LUCENE-8525:
-------------------------------------
{quote}If there is a data corruption in data input (that can be detected) then
it's really an exceptional situation: you can't reasonably recover from it. I
really can't think of a scenario where it'd be reasonable to declare an
exception like "VLongEncodingInvalid" or something like that.
{quote}
I think the whole point here is that there's simply a huge practical difference
between running into e.g. "too many open files" which is a system level issues
ES can deal with and running into corrupt data and being thrown "Invalid vInt
detected (too many bits)".
The latter is a permanent and in all likelihood unrecoverable issue with the
data and in the practical problem at hand here would lead ES to act on this
information and mark the data as corrupt.The former is something that can be
recovered from and handled accordingly.
{quote}The exception's message, not its type, carries the details (for logging,
for example).
{quote}
I think the problem here really is that the details are important logically and
not simply for logging output. It's not the difference between "too many open
files" and say "No space left on device" which can both be logged and handled
by the user accordingly.
Rather the situations is analogous to say the difference between SSLException
(problem with the specific bytes handled) and SocketTimeoutException (resource
issue) where both are IOExceptions that you simply need to be able to tell
apart programatically in many cases. To me it seems like the problem here is
the same, if I can't programatically tell if I need to free up system resources
or if my data is unrecoverably broken then that seems wrong.
> throw more specific exception on data corruption
> ------------------------------------------------
>
> Key: LUCENE-8525
> URL: https://issues.apache.org/jira/browse/LUCENE-8525
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Vladimir Dolzhenko
> Priority: Major
>
> DataInput throws generic IOException if data looks odd
> [DataInput:141|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/DataInput.java#L141]
> there are other examples like
> [BufferedIndexInput:219|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/BufferedIndexInput.java#L219],
>
> [CompressionMode:226|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressionMode.java#L226]
> and maybe
> [DocIdsWriter:81|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java#L81]
> That leads to some difficulties - see [elasticsearch
> #34322|https://github.com/elastic/elasticsearch/issues/34322]
> It would be better if it throws more specific exception.
> As a consequence
> [SegmentInfos.readCommit|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L281]
> violates its own contract
> {code:java}
> /**
> * @throws CorruptIndexException if the index is corrupt
> * @throws IOException if there is a low-level IO error
> */
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]