Re: Distributed Tracing in Apache Beam

2020-04-21 Thread Kenneth Knowles
+dev I don't have a ton of time to dig in to this, but I wanted to say that this is very cool and just drop a couple pointers (which you may already know about) like Explaining Outputs in Modern Data Analytics [1] which was covered by The Morning Paper [2]. This just happens to be something I rea

Re: Distributed Tracing in Apache Beam

2020-04-17 Thread Rion Williams
Hi Alexey, I think you’re right about the wrapper, it’s likely unnecessary as I think I’d have enough information in the headers to rehydrate my “tracer” that communicates the traces/spans to Jaeger as needed. I’d love to not have to touch those or muddy the waters with a wrapper class, additi

Re: Distributed Tracing in Apache Beam

2020-04-17 Thread Alexey Romanenko
Hi Rion, In general, yes, it sounds reasonable to me. I just do not see why you need to have extra Traceable wrapper? Do you need to keep some temporary information there that you don’t want to store in Kafka record headers? PS: Now I started to think that we probably have to change an interfa

Re: Distributed Tracing in Apache Beam

2020-04-17 Thread Rion Williams
Hi Alexey, So this is currently the approach that I'm taking. Basically creating a wrapper Traceable class that will contain all of my record information as well as the data necessary to update the traces for that record. It requires an extra step and will likely mean persisting something along

Re: Distributed Tracing in Apache Beam

2020-04-17 Thread Alexey Romanenko
Not sure if it will help, but KafkaIO allows to keep all meta information while reading (using KafkaRecord) and writing (using ProducerRecord). So, you can keep your tracing id in the record headers as you did with Kafka Streams. > On 17 Apr 2020, at 18:58, Rion Williams wrote: > > Hi Alex,

Re: Distributed Tracing in Apache Beam

2020-04-17 Thread Rion Williams
Hi Alex, As mentioned before, I'm in the process of migrating a pipeline of several Kafka Streams applications over to Apache Beam and I'm hoping to leverage the tracing infrastructure that I had established using Jaeger whenever I can, but specifically to trace an element as it flows through a

Re: Distributed Tracing in Apache Beam

2020-04-17 Thread Alex Van Boxel
Can you explain a bit more of what you want to achieve here? Do you want to trace how your elements go to the pipeline or do you want to see how every ParDo interacts with external systems? On Fri, Apr 17, 2020, 17:38 Rion Williams wrote: > Hi all, > > I'm reaching out today to inquire if Apach