[
https://issues.apache.org/jira/browse/SOLR-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698792#comment-14698792
]
Erick Erickson commented on SOLR-7836:
--------------------------------------
Poking a little more, opening a new searcher in add happens only when
clearCaches==true, which only happens explicitly in DUH2.addAndDelete which is
where all this started. There's also a call in the CDCR code that passes a
variable in, but I don't think that's really relevant.
It's simple enough to move opening a new searcher up to these two places, I'll
give it a try to evaluate. I don't like that solution much since it's trappy; a
new call to add(cmd, true) that fails to open a new searcher could re-introduce
the problem that opening that searcher where it's done now is designed to
prevent. I suppose a big fat warning is in order?
Let me try it just to see whether it cures things or not. I'm pretty sure it'll
cure the deadlock problem, I'll first try to just comment out the openSearcher
and see if I can blow up the real time get tests, then move the open out and
see if either realtime get tests or the new deadlock test fail with the
reorganized code. When I collect that data we can discuss some more. Probably
have something later today.
[[email protected]] Those numbers in the new test were chosen completely
arbitrarily, I'm guessing that the point of your changes is to drive the
failure more often without lengthening the time the test takes, so I'll
incorporate them.
> Possible deadlock when closing refcounted index writers.
> --------------------------------------------------------
>
> Key: SOLR-7836
> URL: https://issues.apache.org/jira/browse/SOLR-7836
> Project: Solr
> Issue Type: Bug
> Reporter: Erick Erickson
> Assignee: Erick Erickson
> Fix For: Trunk, 5.4
>
> Attachments: SOLR-7836-synch.patch, SOLR-7836.patch, SOLR-7836.patch,
> SOLR-7836.patch, deadlock_3.res.zip, deadlock_5_pass_iw.res.zip, deadlock_test
>
>
> Preliminary patch for what looks like a possible race condition between
> writerFree and pauseWriter in DefaultSorlCoreState.
> Looking for comments and/or why I'm completely missing the boat.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]