gus-asf commented on PR #2379: URL: https://github.com/apache/solr/pull/2379#issuecomment-2029799436
One thing folks may want to review is the addition of synchronization around manipulations of HttpShardHandler#responseCancelableMap which were necessary to avoid CME such as: ``` { "responseHeader":{ "zkConnected":true, "status":500, "QTime":18, "params":{ "q":"document", "indent":"true", "q.op":"OR", "timeAllowed":"5", "allowPartialResults":"false", "useParams":"", "_":"1711320289606" } }, "error":{ "trace":"java.util.ConcurrentModificationException at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1597) at java.base/java.util.HashMap$ValueIterator.next(HashMap.java:1625) at org.apache.solr.handler.component.HttpShardHandler.cancelAll(HttpShardHandler.java:257) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:577) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:238) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2886) at org.apache.solr.servlet.HttpSolrCall.executeCoreRequest(HttpSolrCall.java:876) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:560) at org.apache.solr.servlet.SolrDispatchFilter.dispatch(SolrDispatchFilter.java:254) at org.apache.solr.servlet.SolrDispatchFilter.lambda$doFilter$0(SolrDispatchFilter.java:215) at org.apache.solr.servlet.ServletUtils.traceHttpRequestExecution2(ServletUtils.java:247) at org.apache.solr.servlet.ServletUtils.rateLimitRequest(ServletUtils.java:211) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:209) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:192) at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:210) at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1635) ``` This happens because the code is iterating an unsynchronized hash map when the request thread signals a cancel operation. I chose to also enclose pending.incrementAndGet() or decrementAndGet() and responses.take() and even though they are by themselves probablly safe, there's a loop on the value of pending > 0 and these three operations are logically linked so it seemed a bit dangerous to leave them independent. Also I noticed that ShardeResponse.code is being set, but the IDE flags it as "assigned but never read", and against the recommendation of the previously existing comments it's not volatile... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org