[ 
https://issues.apache.org/jira/browse/SOLR-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680200#comment-14680200
 ] 

Erick Erickson commented on SOLR-7836:
--------------------------------------

bq: I'm pretty sure that deadlocks around accessing an index writer should not 
involved synchronization work with the tlog. It may have inadvertently helped, 
but the two things are pretty unrelated.

I don't disagree, but the update log and index writer are intertwined, that's 
the problem. I'm perfectly willing to agree that they should be separated out 
completely, but haven't had any confirmation that they can be, or were ever 
intended to be separated.

ulog.add() calls openNewSearcher which gets an indexWriter which is where 
things to south. Of course it calls getIndexWriter with null which has the note 
"core == null is a signal to just return the current writer, or null"; It 
doesn't really increment the reference count but does go through the interlock 
with pauseWriter and the like. Of course then openNewSearcher does a decref on 
the writer, which was never incremented in the first place and only works 
because the decref for index writer doesn't decrement if the count is 0.

I've no objection to taking the two additional synchronized blocks out of 
DirectUpdateHandler2. The one in addAndDelete was already there although it was 
enclosed by getting an index writer (which is where all the problems happened). 
I'm not adverse to taking that one out too

BTW, you can't use tests.iters for the new test. I didn't want to wait for the 
default suite timeout so I set it locally to 10 minutes and that timer 
apparently runs across all iters. I wrote a shell script to re-invoke the test 
for a long time (500 times last night).

> Possible deadlock when closing refcounted index writers.
> --------------------------------------------------------
>
>                 Key: SOLR-7836
>                 URL: https://issues.apache.org/jira/browse/SOLR-7836
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>             Fix For: Trunk, 5.4
>
>         Attachments: SOLR-7836-synch.patch, SOLR-7836.patch, SOLR-7836.patch, 
> SOLR-7836.patch
>
>
> Preliminary patch for what looks like a possible race condition between 
> writerFree and pauseWriter in DefaultSorlCoreState.
> Looking for comments and/or why I'm completely missing the boat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to