[
https://issues.apache.org/jira/browse/LUCENE-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784438#comment-16784438
]
Simon Willnauer commented on LUCENE-8692:
-----------------------------------------
{noformat}
I think there is an issue with the patch with MergeAbortedExeption indeed given
that registerMerge might throw such an exception. Maybe we should move this try
block to registerMerge instead where we know which OneMerge is being registered
(and is also where the exception is thrown when estimating the size of the
merge).
{noformat}
+1
{code:java}
- } catch (VirtualMachineError tragedy) {
+ } catch (Throwable tragedy) {
tragicEvent(tragedy, "startCommit");
{code}
I am not sure why we need to treat every exception as fatal in this case?
I also wonder if we could move this to a PR on github, iterations would be
simpler and comments too. I can't tell which patch is relevant which one isn't.
> IndexWriter.getTragicException() nay not reflect all corrupting exceptions
> (notably: NoSuchFileException)
> ---------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-8692
> URL: https://issues.apache.org/jira/browse/LUCENE-8692
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Hoss Man
> Priority: Major
> Attachments: LUCENE-8692.patch, LUCENE-8692.patch,
> LUCENE-8692_test.patch
>
>
> Backstory...
> Solr has a "LeaderTragicEventTest" which uses MockDirectoryWrapper's
> {{corruptFiles}} to introduce corruption into the "leader" node's index and
> then assert that this solr node gives up it's leadership of the shard and
> another replica takes over.
> This can currently fail sporadically (but usually reproducibly -
> seeSOLR-13237) due to the leader not giving up it's leadership even after the
> corruption causes an update/commit to fail. Solr's leadership code makes
> this decision after encountering an exception from the IndexWriter based on
> wether {{IndexWriter.getTragicException()}} is (non-)null.
> ----
> While investigating this, I created an isolated Lucene-Core equivilent test
> that demonstrates the same basic situation:
> * Gradually cause corruption on an index untill (otherwise) valid execution
> of IW.add() + IW.commit() calls throw an exception to the IW client.
> * assert that if an exception is thrown to the IW client,
> {{getTragicException()}} is now non-null.
> It's fairly easy to make my new test fail reproducibly -- in every situation
> I've seen the underlying exception is a {{NoSuchFileException}} (ie: the
> randomly introduced corruption was to delete some file).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]