[ https://issues.apache.org/jira/browse/SOLR-16676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17698160#comment-17698160 ]
Alex Deparvu commented on SOLR-16676: ------------------------------------- it seems the trouble is with the request callbacks being called on different threads, under some circumstances (I am assuming high load/ memory pressure). some assumptions I added do not hold anymore and the following pattern is not correct: ``` MDCCopyHelper mdcCopyHelper = new MDCCopyHelper(); req.onRequestBegin(mdcCopyHelper); // omitted for brevity req.onComplete(mdcCopyHelper); ``` the failure happens as the following: * new MDCCopyHelper() happens on the caller thread, picks up the correct MDC context (as expected) * onRequestBegin callback happens on thread 3: '[httpShardExecutor-18-thread-3)' * 'response processing started' happens on a different thread 'httpShardExecutor-18-thread-2' which has no MDC context available to it to push forward it seems more often than not the last 2 events are happening on the same thread which allows the MDC context to be copied over correctly. I am still evaluating options for the fix. > Http2SolrClient loss of MDC context > ----------------------------------- > > Key: SOLR-16676 > URL: https://issues.apache.org/jira/browse/SOLR-16676 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ > Affects Versions: 9.0, 9.1 > Reporter: Alex Deparvu > Priority: Minor > Fix For: 9.2 > > Time Spent: 4.5h > Remaining Estimate: 0h > > The Http2SolrClient loses MDC context information when running an async > request in Solr 9.x. > The issue is the 'Request#send' [0] call is actually async itself and by the > time the response listener kicks in to push the response processing to the > executor the MDC context is already lost, so the executor will no longer have > access to the original MDC in order to push it forward onto the thread that > will process the response. > > This is very difficult to capture on a running system, there are no logs > during this window. I only saw it because I was specifically looking at > thread names for a different reason. > This is how it is reflected in the thread names: > - how it should be (Solr 8 style. containing all MDC data): > {quote}{{httpShardExecutor-5-thread-19-processing-gettingstarted_shard2_replica_n2 > core_node5 localhost:8983_solr gettingstarted shard2 localhost-4}} > {quote} > - how it is in Solr 9 (due to no MDC context) > {quote}httpShardExecutor-5-thread-10 > {quote} > I can't tell if there is anything breaking due to this. > [0] > [https://github.com/apache/solr/blob/7eee7a8ad3c43db0dc26c663dd16764d1fb3dbf4/solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java#L458] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org