[
https://issues.apache.org/jira/browse/SOLR-18174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jan Høydahl updated SOLR-18174:
-------------------------------
Attachment: (was: threads-healthrecord-test-node-0.json)
> AsyncTracker Semaphore leak on LBAsyncSolrClient retries
> --------------------------------------------------------
>
> Key: SOLR-18174
> URL: https://issues.apache.org/jira/browse/SOLR-18174
> Project: Solr
> Issue Type: Bug
> Components: SolrJ
> Reporter: Jan Høydahl
> Assignee: Jan Høydahl
> Priority: Major
>
> Experienced complete deadlocked Solr 9.10.1 distributed requests several
> times in production, once every copule of days. A Solr restart resolved the
> issue. This started happending immediately after upgrading from Solr 9.7 to
> 9.10.
> I had Claude make an analysis of what could be happening, see
> [https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406622977]
> . This identifies several code changes related to distributed search between
> those versions and involves jiras SOLR-17819, SOLR-17792, SOLR-17776 related
> to changed behavior with cancelAll and request.abort during aborted or failed
> queries, which could lead to a semaphore leak, at least temporarily for 10
> min.
> Later we were able to catch an internal test environment in the failure
> state, and were able to make tread dumps for the two nodes in the cluster
> (attached). Analyzing these with Claude identified another failure mode:
> LBHttp2SolrClient has a retry logic if the first request fails, and it will
> spawn a new request which obtains another Semaphore permit, without first
> releasing the permit obtained for the original query. Net result is that the
> original permit is leaked. A description of this failure scenario will be
> presented in a Pull Request which also shows reproduction and a fix.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]