Thanks for digging into this. We handle async calls in providers or/and
interceptors to make sure the spans are always properly closed. Usually,
the part when spans are sent off to the reporter is the tricky one, the
flakiness may come from there (or there is an issue indeed). Thanks.
Best Regards,
Andriy Redko
On Wed, Aug 29, 2018, 3:46 PM Daniel Kulp <[email protected]> wrote:
>
>
> > On Aug 29, 2018, at 2:57 PM, Andrey Redko <[email protected]> wrote:
> >
> > Hey Dan,
> >
> > We could try to add a shirt delay (to let spans flush) or trigger flush
> > forceably (if it is feasible). Colm also reported a few tests are failing
> > for him a while back. I will make it my priority to stabilize them.
> Thanks.
>
> I don’t think that will fix it in this case. For the async calls, I
> don’t think anything is “finishing” the parent span. The
> BraveTracerContext.wrap call creates a child scope which is then closed,
> but nothing actually finishes the span that is created in
> AbstractBraveProvider (line 58). That span is propagated (line 70), and
> then used as the parent, but as far as I can tell, then not ever finished
> and thus never flushed out until is garbage collected or something. Not
> sure if on the continuation resume, we need to grab the span and create a
> scope or something for it.
>
> Not sure if that helps at all.
>
> Dan
>
>
> >
> > Best Regards,
> > Andriy Redko
> >
> > On Wed, Aug 29, 2018, 2:33 PM Daniel Kulp <[email protected] <mailto:
> [email protected]>> wrote:
> >
> >>
> >> The tracing systests have been very unstable for me, failing more often
> >> then not with failures in
> >> org.apache.cxf.systest.jaxrs.tracing.brave.BraveTracingTest. The
> test it
> >> eventually fails in in that class seems relatively random. In each
> case,
> >> the number of spans is greater than what is expected. Is anyone else
> >> seeing that?
> >>
> >> I tried digging into it and it LOOKS like the calls to "get
> >> /bookstore/books/async” are leaving an “inFlight” span in the Tracer.
> >> That span is then delivered at some point in the future which then
> causes a
> >> test to fail.
> >>
> >> 0:
> >>
> {"traceId":"3a5f1a7d2de45f49","parentId":"3a5f1a7d2de45f49","id":"b0f4e2ddef4251f5","name":"processing
> >>
> books","timestamp":1535566440433652,"duration":200595,"localEndpoint":{"serviceName":"unknown","ipv4":"192.168.1.180"}}
> >> 1:
> >>
> {"traceId":"3a5f1a7d2de45f49","id":"3a5f1a7d2de45f49","kind":"SERVER","name":"get
> >>
> /bookstore/books/async","timestamp":1535566440423025,"duration":212695,"localEndpoint":{"serviceName":"unknown","ipv4":"192.168.1.180"},"tags":{"http.method":"GET","http.path":"/bookstore/books/async"}}
> >>
> Tracer{inFlight=[{"traceId":"3a5f1a7d2de45f49","id":"3a5f1a7d2de45f49","localEndpoint":{"serviceName":"unknown","ipv4":"192.168.1.180"},"shared":true}],
> >> reporter=org.apache.cxf.systest.brave.TestSpanReporter@1da2cb77}
> >>
> >>
> >> Is there something missing on the sever side in the async case to close
> >> off the span or something?
> >>
> >>
> >> --
> >> Daniel Kulp
> >> [email protected] <mailto:[email protected]> <mailto:[email protected]
> <mailto:[email protected]>> - http://dankulp.com/blog <
> http://dankulp.com/blog> <
> >> http://dankulp.com/blog <http://dankulp.com/blog>>
> >> Talend Community Coder - http://talend.com <http://talend.com/> <
> http://coders.talend.com/ <http://coders.talend.com/>>
>
> --
> Daniel Kulp
> [email protected] <mailto:[email protected]> - http://dankulp.com/blog <
> http://dankulp.com/blog>
> Talend Community Coder - http://talend.com <http://coders.talend.com/>
>