[ 
https://issues.apache.org/jira/browse/SOLR-16676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17698160#comment-17698160
 ] 

Alex Deparvu commented on SOLR-16676:
-------------------------------------

it seems the trouble is with the request callbacks being called on different 
threads, under some circumstances (I am assuming high load/ memory pressure). 
some assumptions I added do not hold anymore and the following pattern is not 
correct:
```
MDCCopyHelper mdcCopyHelper = new MDCCopyHelper();
    req.onRequestBegin(mdcCopyHelper);
   // omitted for brevity
    req.onComplete(mdcCopyHelper);
```
the failure happens as the following:
* new MDCCopyHelper() happens on the caller thread, picks up the correct MDC 
context (as expected)
* onRequestBegin callback happens on thread 3: '[httpShardExecutor-18-thread-3)'
* 'response processing started' happens on a different thread 
'httpShardExecutor-18-thread-2' which has no MDC context available to it to 
push forward

it seems more often than not the last 2 events are happening on the same thread 
which allows the MDC context to be copied over correctly.
I am still evaluating options for the fix.


> Http2SolrClient loss of MDC context
> -----------------------------------
>
>                 Key: SOLR-16676
>                 URL: https://issues.apache.org/jira/browse/SOLR-16676
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrJ
>    Affects Versions: 9.0, 9.1
>            Reporter: Alex Deparvu
>            Priority: Minor
>             Fix For: 9.2
>
>          Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The Http2SolrClient loses MDC context information when running an async 
> request in Solr 9.x.
> The issue is the 'Request#send' [0] call is actually async itself and by the 
> time the response listener kicks in to push the response processing to the 
> executor the MDC context is already lost, so the executor will no longer have 
> access to the original MDC in order to push it forward onto the thread that 
> will process the response.
>  
> This is very difficult to capture on a running system, there are no logs 
> during this window. I only saw it because I was specifically looking at 
> thread names for a different reason.
> This is how it is reflected in the thread names:
>  - how it should be (Solr 8 style. containing all MDC data): 
> {quote}{{httpShardExecutor-5-thread-19-processing-gettingstarted_shard2_replica_n2
>  core_node5 localhost:8983_solr gettingstarted shard2 localhost-4}}
> {quote}
>  - how it is in Solr 9 (due to no MDC context)
> {quote}httpShardExecutor-5-thread-10
> {quote}
> I can't tell if there is anything breaking due to this.
> [0] 
> [https://github.com/apache/solr/blob/7eee7a8ad3c43db0dc26c663dd16764d1fb3dbf4/solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java#L458]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to