[
https://issues.apache.org/jira/browse/SOLR-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559516#comment-14559516
]
Timothy Potter commented on SOLR-7587:
--------------------------------------
Have a little more information about what caused this failure. Had to dig into
the JavaDoc for ReentrantReadWriteLock a bit and found this little gem:
{quote}
Reentrancy also allows downgrading from the write lock to a read lock, by
acquiring the write lock, then the read lock and then releasing the write lock.
However, upgrading from a read lock to the write lock is not possible.
{quote}
All the test failures because of this situation occurred during a commit.
Commits acquire a read-lock on the VersionInfo object (see
{{DistributedUpdateProcessor#versionAdd}} method). My code introduced the need
for acquiring the write-lock and as we learned above, you can't upgrade a
read-lock to a write-lock. The problem is where I had this code; specifically I
hung it off of the code that handles {{firstSearcher}} events, since I need a
searcher in order to lookup the max value from the index to seed version
buckets with. But all this seems like the test should fail consistently every
time, which is not the case. So clearly there's some timing involved with this
fail. This code only fires when {{currSearcher == null}} and I don't get how
that could be at the point where the test is sending a commit (see below)?
{code}
at org.apache.solr.update.VersionInfo.blockUpdates(VersionInfo.java:118)
at org.apache.solr.update.UpdateLog.onFirstSearcher(UpdateLog.java:1604)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1810)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1505)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:617)
- locked <0x00000000f6f09a10> (a java.lang.Object)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1635)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1612)
at
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:161)
at
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2051)
at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:179)
at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:483)
at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:502)
at
org.apache.solr.client.solrj.response.TestSpellCheckResponse.testSpellCheckResponse(TestSpellCheckResponse.java:51)
{code}
The searcher gets registered in futures but seems unlikely that the test should
get this far before the searcher opened during core initialization is set to
the currSearcher. At any rate, the patch I submitted moves the bucket seeding
code (which needs a write-lock) out of the firstSearcher code path and into the
SolrCore ctor, which fixes the issue of needing a write-lock when a read-lock
as already been acquired for a commit operation. It's still a question in my
mind as to how the test can get to sending a commit when {{currSearcher ==
null}} ... any thoughts on that?
> TestSpellCheckResponse stalled and never timed out -- possible VersionBucket
> bug? (5.2 branch)
> ----------------------------------------------------------------------------------------------
>
> Key: SOLR-7587
> URL: https://issues.apache.org/jira/browse/SOLR-7587
> Project: Solr
> Issue Type: Bug
> Reporter: Hoss Man
> Assignee: Timothy Potter
> Priority: Blocker
> Fix For: 5.2
>
> Attachments: SOLR-7587.patch, jstack.1.txt, jstack.2.txt,
> junit4-J0-20150522_181244_599.events, junit4-J0-20150522_181244_599.spill,
> junit4-J0-20150522_181244_599.suites
>
>
> On the 5.2 branch (r1681250), I encountered a solrj test stalled for over 110
> minutes before i finally killed it...
> {noformat}
> [junit4] Suite: org.apache.solr.common.util.TestRetryUtil
> [junit4] Completed [55/60] on J1 in 1.04s, 1 test
> [junit4]
> [junit4] HEARTBEAT J0 PID(12147@tray): 2015-05-22T18:14:56, stalled for
> 121s at: TestSpellCheckResponse.testSpellCheckResponse
> [junit4] HEARTBEAT J0 PID(12147@tray): 2015-05-22T18:15:56, stalled for
> 181s at: TestSpellCheckResponse.testSpellCheckResponse
> ...
> [junit4] HEARTBEAT J0 PID(12147@tray): 2015-05-22T20:00:56, stalled for
> 6481s at: TestSpellCheckResponse.testSpellCheckResponse
> [junit4] HEARTBEAT J0 PID(12147@tray): 2015-05-22T20:01:56, stalled for
> 6541s at: TestSpellCheckResponse.testSpellCheckResponse
> [junit4] HEARTBEAT J0 PID(12147@tray): 2015-05-22T20:02:56, stalled for
> 6601s at: TestSpellCheckResponse.testSpellCheckResponse
> {noformat}
> I'll attach some jstack output as well as all the temp files from the J0
> runner.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]