[ 
https://issues.apache.org/jira/browse/CAMEL-21302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17891046#comment-17891046
 ] 

Freeman Yue Fang commented on CAMEL-21302:
------------------------------------------

Hi [~jpoth], [~davsclaus],

I have more update of this issue after reading code in camel-tracing and 
camel-opentelemetry.

1. For the scenario that a camel producer endpoint using asynchronous way to 
send request and handle response in different threads, actually in 
camel-tracing and camel-opentelemetry have the mechanism to accommodate this 
scenario already.
There is a ExchangeAsyncStartedEvent which calls into 
org.apache.camel.tracing.ActiveSpanManager
{code}
/**
     * If the underlying span is active, closes its scope without ending the 
span. This method should be called after
     * async execution is started on the same thread on which span was 
activated. ExchangeAsyncStartedEvent is used to
     * notify about it.
     *
     * @param exchange The exchange
     */
    public static void endScope(Exchange exchange) {
        Holder holder = exchange.getProperty(ExchangePropertyKey.ACTIVE_SPAN, 
Holder.class);
        if (holder != null) {
            holder.closeScope();
        }
    }
{code}

when a asynchronous camel producer(for example, a camel-cxf producer) endpoint 
sends request asynchronously. And this ExchangeAsyncStartedEvent and endScope 
method in org.apache.camel.tracing.ActiveSpanManager ensures the producer 
CLIENT scope can be closed in the expected thread.

2. Why we have the issue exposed in AsyncCxfTest.java?
The problem actually comes from the DirectProduer used in the testcase.
The leak context actually is from DirectProducer but not CxfRsProducer(direct 
CLIENT but not cxfrs CLIENT)
{code}
There must be no leaking span after test ==> expected: <{}> but was: 
<{opentelemetry-trace-span-key=SdkSpan{traceId=c29955d3b94b5673cc9ab23f43938a92,
 spanId=d6fa2ad9cdd6c636, 
parentSpanContext=ImmutableSpanContext{traceId=c29955d3b94b5673cc9ab23f43938a92,
 spanId=c0f0bed8f3615a2d, traceFlags=01, 
traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, 
name=send, kind=CLIENT, attributes=AttributesMap{data={url.scheme=direct, 
camel.uri=direct://send, url.path=send, component=camel-direct}, capacity=128, 
totalAddedValues=4}, status=ImmutableStatusData{statusCode=UNSET, 
description=}, totalRecordedEvents=0, totalRecordedLinks=0, 
startEpochNanos=1729272514478877531, endEpochNanos=1729272514491415078}}>
{code}

And if we take a look at the related code in DirectProducer.java
{code}
} else {
                // the consumer may be forced synchronous
                if (consumer.getEndpoint().isSynchronous()) {
                    consumer.getProcessor().process(exchange);
                    callback.done(true);
                    return true;
                } else {
                    return consumer.getAsyncProcessor().process(exchange, 
callback);
                }
            }
{code}

The above code
{code}
         return consumer.getAsyncProcessor().process(exchange, callback);
{code}

will call the next DirectConsumer endpoint in the same thread, which will 
create Direct INTERNAL span/scope and this messes the scope/span stack, this 
causes that the previous camel-direct producer CLIENT scope can't be closed 
properly and hence causes the leak.

I just sent a PR
https://github.com/apache/camel/pull/16008
And in the PR
https://github.com/apache/camel/pull/16008/commits/b9bf81b24c8a3c24117cc5454cafac906e8dd136#diff-62012c8f6e9737b3fe210a2abbb42fee51e22315ad31b82caf9df4ba3c53e9f5
I added CurrentSpanTest.testDirectToDirectToAsync which exposes the same 
problem even without cxfrs endpoint involved.

Best Regards
Freeman


> camel-opentelemetry context leak with direct async producer
> -----------------------------------------------------------
>
>                 Key: CAMEL-21302
>                 URL: https://issues.apache.org/jira/browse/CAMEL-21302
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-opentelemetry
>            Reporter: John Poth
>            Assignee: Freeman Yue Fang
>            Priority: Major
>
> There seems to be a Otel context leak when using a CXF producer in async 
> mode. This causes different requests to have the same _traceId._ As a 
> workaround, setting _synchronous=true_ on the CXF producer resolves the 
> issue. Here's a reproducer:
> {code:java}
> @Override
> protected RoutesBuilder createRouteBuilder() {
>     return new RouteBuilder() {
>         @Override
>         public void configure() {
>             from("direct:start").routeId("myRoute")
>                     .to("direct:send")
>                     .end();
>             from("direct:send")
>                     .log("message")
>                     .to("cxfrs:http://localhost:"; + port1
>                         + "/rest/helloservice/sayHello?synchronous=false"); 
> // setting to 'true' resolves the issue
>             restConfiguration()
>                     .port(port1);
>             rest("/rest/helloservice")
>                     .post("/sayHello").routeId("rest-GET-say-hi")
>                     .to("direct:sayHi");
>             from("direct:sayHi")
>                     .routeId("mock-GET-say-hi")
>                     .log("example")
>                     .to("mock:end");
> }};
> {code}
>  
> I've added the complete unit here: 
> https://github.com/apache/camel/blob/7d83a62b8e442dc9ac6fd79b153192add940301e/components/camel-opentelemetry/src/test/java/org/apache/camel/opentelemetry/AsyncCxfTest.java



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to