janhoy commented on PR #4236: URL: https://github.com/apache/solr/pull/4236#issuecomment-4122024576
@dsmiley, @mlbiscoc, @kotman12, @HoustonPutman I'm not calling you out here to get a thorough review of this PR code or every theory in the mostly LLM generated analysis. But you are bright minds, and I fear that this issue is a series of bugs lurking, that will crop up once more heavy usage of newer solr releases reaches prime time. Perhaps some of you will connect some dots when seeing some of the code paths being discussed here. @HoustonPutman I ping you since you were bit by the request cancellation/abort bug in 9.x and I wonder if you could shed light on why `request.abort()` was also removed from 10.s line. The actual bug fixed in this PR is quite serious I believe. The LBSolrClient's retry request is executed synchronously on the IO selector thread instead of in the backgorund, thus enabling the deadlock when semaphore permits are depleted. My $10000 question is how those permits leak in the first place, so I added a metric gauge for it. I suspect there is some code path acquiring a permit that is never released, but so far I have some LLM theories but no evidence. I may build a custom Solr 9.10.2-SNAPSHOT with added logging and instrumentation around the AsyncTracker and deploy it to our test cluster hoping for a reproduction, although it took 14 days of run time before it manifested last time... Thankful for advice on strategies for catching the root cause. Tricky thing is it may be related to complex servicemesh proxying and mass-interruption of open connections.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
