[ https://issues.apache.org/jira/browse/SOLR-15783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris M. Hostetter updated SOLR-15783: -------------------------------------- Attachment: SOLR-15783.patch Status: Open (was: Open) There's a lot going on in {{MDCLoggingContext}} that I don't really understand (some of it due to historic choices that I can't make sense of, and some of it seems like the result of gradual evolution in ways it wasn't intended to – or in some cases: that it was specifically designed to prevent) that probably could/should be reviewed more in depth ... but IMO it seems like (independent of any of the choices made there) we can safely (and should) make the following changes: * {{SolrDispatchFilter.doFilter}} should populate the MDC with nodeID (when available) before executing any code ** it ultimately comes from CoreContainer, which SolrDispatchFilter has on init ** no reason to wait for {{HttpSolrCall.call()}} * trace_id should get populated in the MDC as soon as the top level "scope" is entered ** ie: set this in {{SolrDispatchFilter.doFilter}} ** no reason to wait for {{HttpSolrCall.call()}} * {{SolrDispatchFilter}} should ensure that _ALL_ MDC values (not just those explicitly managed by MDCLoggingContext) are "cleaned up" ** To prevent leaks like the current issue with trace_id, as well as any other values that might leak from other places in the code now or in the future ** TestHarness should do the same thing so non-jetty tests get the same level of cleanup (not really a "bug" if they did since it would be test only code – but just to prevent any risk of confusion) ...the attached patch makes these changes by introducing an (AutoCloseable) {{MDCSnapshot}} modeled after SLF4J's existing {{MDC.MDCCloseable}} class [~dsmiley] : Any comments/concerns here? > MDC Values can leak between request threads, trace_id currently does > -------------------------------------------------------------------- > > Key: SOLR-15783 > URL: https://issues.apache.org/jira/browse/SOLR-15783 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Chris M. Hostetter > Assignee: Chris M. Hostetter > Priority: Major > Attachments: SOLR-15783.patch > > > Currently, the only general "MDC Cleanup" code that happens from request to > request is done by {{MDCLoggingContext.clear()}} and > {{MDCLoggingContext.reset()}} ... but these methods (in addition to only > being concerned with a static list of 5 specific MDC values) are currently > designed and used in such a way that things like "trace_id" are only "reset" > at the _begining_ of {{HttpSolrCall.call()}} ... which means any request > specific log messages that happen before that point (like authn errors) will > incorrectly refer to the trace_id of whatever request the thread _previously_ > processed. > As a trivial example of this, if you apply a patch like... > {noformat} > diff --git > a/solr/core/src/java/org/apache/solr/client/solrj/embedded/JettySolrRunner.java > > b/solr/core/src/java/org/apache/solr/client/solrj/embedded/JettySolrRunner.java > index 5f481c48e92..97c864617f4 100644 > --- > a/solr/core/src/java/org/apache/solr/client/solrj/embedded/JettySolrRunner.java > +++ > b/solr/core/src/java/org/apache/solr/client/solrj/embedded/JettySolrRunner.java > @@ -109,7 +109,7 @@ public class JettySolrRunner { > > private static final Logger log = > LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); > > - private static final int THREAD_POOL_MAX_THREADS = 10000; > + private static final int THREAD_POOL_MAX_THREADS = 10; // nocommit > // NOTE: needs to be larger than > SolrHttpClient.threadPoolSweeperMaxIdleTime > private static final int THREAD_POOL_MAX_IDLE_TIME_MS = 260000; > > diff --git > a/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java > b/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java > index 28167270099..fa6bb9a4eb8 100644 > --- a/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java > +++ b/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java > @@ -484,6 +484,9 @@ public class SolrDispatchFilter extends BaseSolrFilter { > response.sendError(429, errorMessage); > } > > + log.error("nocommit: fake log message before HttpSolrCall.call() as if > auth error happened: MDC_claims={} ; real_traceId={}", > + > org.slf4j.MDC.get(org.apache.solr.logging.MDCLoggingContext.TRACE_ID), > span.context().toTraceId()); > + > AtomicReference<HttpServletRequest> wrappedRequest = new > AtomicReference<>(); > if (!authenticateRequest(request, response, wrappedRequest)) { // the > response and status code have already been sent > return; > {noformat} > ...and then run {{./gradlew test --tests TestDistributedTracing > -Ptests.verbose=true}} you can pick out any single jetty query thread and see > something like... > {noformat} > 2> 4103 ERROR (qtp1310066330-70) [] o.a.s.s.SolrDispatchFilter nocommit: > fake log message before HttpSolrCall.call() as if auth error happened: > MDC_claims=null ; real_traceId=1 > 2> 6146 ERROR (qtp1310066330-70) [n:127.0.0.1:34135_solr t:1] > o.a.s.s.SolrDispatchFilter nocommit: fake log message before > HttpSolrCall.call() as if auth error happened: MDC_claims=1 ; real_traceId=22 > 2> 6661 ERROR (qtp1310066330-70) [n:127.0.0.1:34135_solr t:22] > o.a.s.s.SolrDispatchFilter nocommit: fake log message before > HttpSolrCall.call() as if auth error happened: MDC_claims=22 ; real_traceId=50 > {noformat} > (Note also that during authn, the nodeId isn't populated for a 'new' thread, > it's only set if it was leaked from the last request processed) -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org