I see. Thanks Weston. This is a nice tracing utils. I will give it a shot. Although it might be more information that I actually want, I might just use a print statement.
As a side question - what do most of Arrow dev use for debugging compute related code? I am new to this and tried to pdb but ended up seeing incorrect data (I observed an ExecBatch with negative length in gdb, but couldn't observe it when using print statements). Someone suggests that it could be because compiling with optimization can lead to anomalous gdb behavior, so I am just curious what other people do. Li On Tue, Apr 19, 2022 at 12:22 PM Weston Pace <weston.p...@gmail.com> wrote: > The EVENT macro is specific to open telemetry tracing. So if `side` > is only used to populate the event then I think you will need to > surround the entire block with: > > ``` > #ifdef ARROW_WITH_OPENTELEMETRY > int side = ... > EVENT(span_, "InputReceived", {{"batch.length", batch.length}, {"side", > side}}); > #endif > ``` > > If you want to see it in action then you can enable open telemetry by > turning on ARROW_WITH_OPENTELEMETRY in the cmake options. However, to > actually get output you will need to tell OT where to send output. > The simplest way to do this is to use the ARROW_TRACING_BACKEND > environment variable. You can see all the options we have at the > moment in src/arrow/util/tracing_internal.cc but a simple choice is > "ostream" which dumps everything to (I think) stdout. > > Example: > > ``` > ARROW_TRACING_BACKEND=ostream ./debug/arrow-dataset-scanner-test \ > > --gtest_filter=TestScannerThreading/TestScanner.FilteredScanNested/2Threaded1d1b1024r > ``` > > Yields something like: > > ``` > { > name : SinkNode: > trace_id : 1e39508fe9fe74bfc1c39cfbe9b63d55 > span_id : afb1b87450748124 > tracestate : > parent_span_id: 4b46e64fb1469f90 > start : 1650385199799132739 > duration : 9878711 > description : > span kind : Internal > status : Unset > attributes : > thread.id: 140287607824768 > node.detail: :SinkNode{} > node.kind: SinkNode > node.label: > events : > { > name : InputFinished > timestamp : 1650385199804563001 > attributes : > batches.length: 1 > } > { > name : InputReceived > timestamp : 1650385199807960461 > attributes : > batch.length: 512 > } > links : > resources : > service.name: unknown_service > telemetry.sdk.version: 1.3.0 > telemetry.sdk.name: opentelemetry > telemetry.sdk.language: cpp > instr-lib : arrow > } > ``` > > To get more complete output from OT you will eventually want to use > the http exporter and export the data to some kind of tool like Jaeger > which can do visualizations of the data and offer flame charts. > > On Tue, Apr 19, 2022 at 5:39 AM Li Jin <ice.xell...@gmail.com> wrote: > > > > Hello! > > > > I am trying to implement a new type of join in Arrow Compute engine (asof > > join). I have been looking at code of HashJoinNode and found some debug > > code that seems to be useful: > > > > e.g.: > > EVENT(span_, "InputReceived", {{"batch.length", batch.length}, > {"side", > > side}}); > > > > But when I try to use similar code in my ExecNode, I got an error: > > > > > /home/icexelloss/workspace/arrow/cpp/src/arrow/compute/exec/asof_join_node.cc:67:9: > > error: unused variable ‘side’ [-Werror=unused-variable] > > 67 | int side = (input == inputs_[0]) ? 0 : 1; > > | ^~~~ > > > > (here is my code): > > void InputReceived(ExecNode* input, ExecBatch batch) override { > > int side = (input == inputs_[0]) ? 0 : 1; > > EVENT(span_, "InputReceived", {{"batch.length", batch.length}, > {"side", > > side}}); > > } > > > > I wonder: > > (1) Is there a special cmake flag I need to pass in to enable the EVENT > > marco? > > (2) What does the EVENT marco do and where does it output to? > > > > Thanks! > > Li >