janhoy commented on PR #4236:
URL: https://github.com/apache/solr/pull/4236#issuecomment-4122024576

   @dsmiley, @mlbiscoc, @kotman12, @HoustonPutman  I'm not calling you out here 
to get a thorough review of this PR code or every theory in the mostly LLM 
generated analysis.
   
   But you are bright minds, and I fear that this issue is a series of bugs 
lurking, that will crop up once more heavy usage of newer solr releases reaches 
prime time. Perhaps some of you will connect some dots when seeing some of the 
code paths being discussed here.
   
   @HoustonPutman I ping you since you were bit by the request 
cancellation/abort bug in 9.x and I wonder if you could shed light on why 
`request.abort()` was also removed from 10.s line.
   
   The actual bug fixed in this PR is quite serious I believe. The 
LBSolrClient's retry request is executed synchronously on the IO selector 
thread instead of in the backgorund, thus enabling the deadlock when semaphore 
permits are depleted.
   
   My $10000 question is how those permits leak in the first place, so I added 
a metric gauge for it. I suspect there is some code path acquiring a permit 
that is never released, but so far I have some LLM theories but no evidence.
   
   I may build a custom Solr 9.10.2-SNAPSHOT with added logging and 
instrumentation around the AsyncTracker and deploy it to our test cluster 
hoping for a reproduction, although it took 14 days of run time before it 
manifested last time... Thankful for advice on strategies for catching the root 
cause. Tricky thing is it may be related to complex servicemesh proxying and 
mass-interruption of open connections..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to