[ 
https://issues.apache.org/jira/browse/SOLR-15783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-15783:
--------------------------------------
    Attachment: SOLR-15783.patch
        Status: Open  (was: Open)

There's a lot going on in {{MDCLoggingContext}} that I don't really understand 
(some of it due to historic choices that I can't make sense of, and some of it 
seems like the result of gradual evolution in ways it wasn't intended to – or 
in some cases: that it was specifically designed to prevent) that probably 
could/should be reviewed more in depth ... but IMO it seems like (independent 
of any of the choices made there) we can safely (and should) make the following 
changes:
 * {{SolrDispatchFilter.doFilter}} should populate the MDC with nodeID (when 
available) before executing any code
 ** it ultimately comes from CoreContainer, which SolrDispatchFilter has on init
 ** no reason to wait for {{HttpSolrCall.call()}}
 * trace_id should get populated in the MDC as soon as the top level "scope" is 
entered
 ** ie: set this in {{SolrDispatchFilter.doFilter}}
 ** no reason to wait for {{HttpSolrCall.call()}}
 * {{SolrDispatchFilter}} should ensure that _ALL_ MDC values (not just those 
explicitly managed by MDCLoggingContext) are "cleaned up"
 ** To prevent leaks like the current issue with trace_id, as well as any other 
values that might leak from other places in the code now or in the future
 ** TestHarness should do the same thing so non-jetty tests get the same level 
of cleanup (not really a "bug" if they did since it would be test only code – 
but just to prevent any risk of confusion)

...the attached patch makes these changes by introducing an (AutoCloseable) 
{{MDCSnapshot}} modeled after SLF4J's existing {{MDC.MDCCloseable}} class

[~dsmiley] : Any comments/concerns here?

> MDC Values can leak between request threads, trace_id currently does
> --------------------------------------------------------------------
>
>                 Key: SOLR-15783
>                 URL: https://issues.apache.org/jira/browse/SOLR-15783
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Chris M. Hostetter
>            Assignee: Chris M. Hostetter
>            Priority: Major
>         Attachments: SOLR-15783.patch
>
>
> Currently, the only general "MDC Cleanup" code that happens from request to 
> request is done by {{MDCLoggingContext.clear()}} and 
> {{MDCLoggingContext.reset()}} ... but these methods (in addition to only 
> being concerned with a static list of 5 specific MDC values) are currently 
> designed and used in such a way that things like "trace_id" are only "reset" 
> at the _begining_ of {{HttpSolrCall.call()}} ... which means any request 
> specific log messages that happen before that point (like authn errors) will 
> incorrectly refer to the trace_id of whatever request the thread _previously_ 
> processed.
> As a trivial example of this, if you apply a patch like...
> {noformat}
> diff --git 
> a/solr/core/src/java/org/apache/solr/client/solrj/embedded/JettySolrRunner.java
>  
> b/solr/core/src/java/org/apache/solr/client/solrj/embedded/JettySolrRunner.java
> index 5f481c48e92..97c864617f4 100644
> --- 
> a/solr/core/src/java/org/apache/solr/client/solrj/embedded/JettySolrRunner.java
> +++ 
> b/solr/core/src/java/org/apache/solr/client/solrj/embedded/JettySolrRunner.java
> @@ -109,7 +109,7 @@ public class JettySolrRunner {
>  
>    private static final Logger log = 
> LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
>  
> -  private static final int THREAD_POOL_MAX_THREADS = 10000;
> +  private static final int THREAD_POOL_MAX_THREADS = 10; // nocommit
>    // NOTE: needs to be larger than 
> SolrHttpClient.threadPoolSweeperMaxIdleTime
>    private static final int THREAD_POOL_MAX_IDLE_TIME_MS = 260000;
>  
> diff --git 
> a/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java 
> b/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java
> index 28167270099..fa6bb9a4eb8 100644
> --- a/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java
> +++ b/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java
> @@ -484,6 +484,9 @@ public class SolrDispatchFilter extends BaseSolrFilter {
>          response.sendError(429, errorMessage);
>        }
>  
> +      log.error("nocommit: fake log message before HttpSolrCall.call() as if 
> auth error happened: MDC_claims={} ; real_traceId={}",
> +                
> org.slf4j.MDC.get(org.apache.solr.logging.MDCLoggingContext.TRACE_ID), 
> span.context().toTraceId());
> +      
>        AtomicReference<HttpServletRequest> wrappedRequest = new 
> AtomicReference<>();
>        if (!authenticateRequest(request, response, wrappedRequest)) { // the 
> response and status code have already been sent
>          return;
> {noformat}
> ...and then run {{./gradlew test --tests TestDistributedTracing 
> -Ptests.verbose=true}} you can pick out any single jetty query thread and see 
> something like...
> {noformat}
>   2> 4103 ERROR (qtp1310066330-70) [] o.a.s.s.SolrDispatchFilter nocommit: 
> fake log message before HttpSolrCall.call() as if auth error happened: 
> MDC_claims=null ; real_traceId=1
>   2> 6146 ERROR (qtp1310066330-70) [n:127.0.0.1:34135_solr t:1] 
> o.a.s.s.SolrDispatchFilter nocommit: fake log message before 
> HttpSolrCall.call() as if auth error happened: MDC_claims=1 ; real_traceId=22
>   2> 6661 ERROR (qtp1310066330-70) [n:127.0.0.1:34135_solr t:22] 
> o.a.s.s.SolrDispatchFilter nocommit: fake log message before 
> HttpSolrCall.call() as if auth error happened: MDC_claims=22 ; real_traceId=50
> {noformat}
> (Note also that during authn, the nodeId isn't populated for a 'new' thread, 
> it's only set if it was leaked from the last request processed)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to