[
https://issues.apache.org/jira/browse/SOLR-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680200#comment-14680200
]
Erick Erickson commented on SOLR-7836:
--------------------------------------
bq: I'm pretty sure that deadlocks around accessing an index writer should not
involved synchronization work with the tlog. It may have inadvertently helped,
but the two things are pretty unrelated.
I don't disagree, but the update log and index writer are intertwined, that's
the problem. I'm perfectly willing to agree that they should be separated out
completely, but haven't had any confirmation that they can be, or were ever
intended to be separated.
ulog.add() calls openNewSearcher which gets an indexWriter which is where
things to south. Of course it calls getIndexWriter with null which has the note
"core == null is a signal to just return the current writer, or null"; It
doesn't really increment the reference count but does go through the interlock
with pauseWriter and the like. Of course then openNewSearcher does a decref on
the writer, which was never incremented in the first place and only works
because the decref for index writer doesn't decrement if the count is 0.
I've no objection to taking the two additional synchronized blocks out of
DirectUpdateHandler2. The one in addAndDelete was already there although it was
enclosed by getting an index writer (which is where all the problems happened).
I'm not adverse to taking that one out too
BTW, you can't use tests.iters for the new test. I didn't want to wait for the
default suite timeout so I set it locally to 10 minutes and that timer
apparently runs across all iters. I wrote a shell script to re-invoke the test
for a long time (500 times last night).
> Possible deadlock when closing refcounted index writers.
> --------------------------------------------------------
>
> Key: SOLR-7836
> URL: https://issues.apache.org/jira/browse/SOLR-7836
> Project: Solr
> Issue Type: Bug
> Reporter: Erick Erickson
> Assignee: Erick Erickson
> Fix For: Trunk, 5.4
>
> Attachments: SOLR-7836-synch.patch, SOLR-7836.patch, SOLR-7836.patch,
> SOLR-7836.patch
>
>
> Preliminary patch for what looks like a possible race condition between
> writerFree and pauseWriter in DefaultSorlCoreState.
> Looking for comments and/or why I'm completely missing the boat.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]