Thanks! This is helpful. Will take a look. On Tue, Apr 19, 2022 at 7:00 PM Weston Pace <[email protected]> wrote:
> I can't speak for others but I do normal development with a debug > build & UBSAN turned on. I haven't had any problems using gdb in this > setup. Usually if I get a release-only bug it's because of timing or > memory being reused more aggressively in which case I would at least > start with ASAN and TSAN and move on to print statements from there. > However, release-only bugs have been quite rare in my experience. > > If you are going to use gdb then you should read [1] as there is a > very helpful gdb extension for debugging Arrow code. > > [1] https://arrow.apache.org/docs/cpp/gdb.html > > On Tue, Apr 19, 2022 at 8:34 AM Li Jin <[email protected]> wrote: > > > > I see. Thanks Weston. This is a nice tracing utils. I will give it a > shot. > > Although it might be more information that I actually want, I might just > > use a print statement. > > > > As a side question - what do most of Arrow dev use for debugging compute > > related code? I am new to this and tried to pdb but ended up seeing > > incorrect data (I observed an ExecBatch with negative length in gdb, but > > couldn't observe it when using print statements). Someone suggests that > it > > could be because compiling with optimization can lead to anomalous gdb > > behavior, so I am just curious what other people do. > > > > Li > > > > On Tue, Apr 19, 2022 at 12:22 PM Weston Pace <[email protected]> > wrote: > > > > > The EVENT macro is specific to open telemetry tracing. So if `side` > > > is only used to populate the event then I think you will need to > > > surround the entire block with: > > > > > > ``` > > > #ifdef ARROW_WITH_OPENTELEMETRY > > > int side = ... > > > EVENT(span_, "InputReceived", {{"batch.length", batch.length}, {"side", > > > side}}); > > > #endif > > > ``` > > > > > > If you want to see it in action then you can enable open telemetry by > > > turning on ARROW_WITH_OPENTELEMETRY in the cmake options. However, to > > > actually get output you will need to tell OT where to send output. > > > The simplest way to do this is to use the ARROW_TRACING_BACKEND > > > environment variable. You can see all the options we have at the > > > moment in src/arrow/util/tracing_internal.cc but a simple choice is > > > "ostream" which dumps everything to (I think) stdout. > > > > > > Example: > > > > > > ``` > > > ARROW_TRACING_BACKEND=ostream ./debug/arrow-dataset-scanner-test \ > > > > > > > --gtest_filter=TestScannerThreading/TestScanner.FilteredScanNested/2Threaded1d1b1024r > > > ``` > > > > > > Yields something like: > > > > > > ``` > > > { > > > name : SinkNode: > > > trace_id : 1e39508fe9fe74bfc1c39cfbe9b63d55 > > > span_id : afb1b87450748124 > > > tracestate : > > > parent_span_id: 4b46e64fb1469f90 > > > start : 1650385199799132739 > > > duration : 9878711 > > > description : > > > span kind : Internal > > > status : Unset > > > attributes : > > > thread.id: 140287607824768 > > > node.detail: :SinkNode{} > > > node.kind: SinkNode > > > node.label: > > > events : > > > { > > > name : InputFinished > > > timestamp : 1650385199804563001 > > > attributes : > > > batches.length: 1 > > > } > > > { > > > name : InputReceived > > > timestamp : 1650385199807960461 > > > attributes : > > > batch.length: 512 > > > } > > > links : > > > resources : > > > service.name: unknown_service > > > telemetry.sdk.version: 1.3.0 > > > telemetry.sdk.name: opentelemetry > > > telemetry.sdk.language: cpp > > > instr-lib : arrow > > > } > > > ``` > > > > > > To get more complete output from OT you will eventually want to use > > > the http exporter and export the data to some kind of tool like Jaeger > > > which can do visualizations of the data and offer flame charts. > > > > > > On Tue, Apr 19, 2022 at 5:39 AM Li Jin <[email protected]> wrote: > > > > > > > > Hello! > > > > > > > > I am trying to implement a new type of join in Arrow Compute engine > (asof > > > > join). I have been looking at code of HashJoinNode and found some > debug > > > > code that seems to be useful: > > > > > > > > e.g.: > > > > EVENT(span_, "InputReceived", {{"batch.length", batch.length}, > > > {"side", > > > > side}}); > > > > > > > > But when I try to use similar code in my ExecNode, I got an error: > > > > > > > > > > > > /home/icexelloss/workspace/arrow/cpp/src/arrow/compute/exec/asof_join_node.cc:67:9: > > > > error: unused variable ‘side’ [-Werror=unused-variable] > > > > 67 | int side = (input == inputs_[0]) ? 0 : 1; > > > > | ^~~~ > > > > > > > > (here is my code): > > > > void InputReceived(ExecNode* input, ExecBatch batch) override { > > > > int side = (input == inputs_[0]) ? 0 : 1; > > > > EVENT(span_, "InputReceived", {{"batch.length", batch.length}, > > > {"side", > > > > side}}); > > > > } > > > > > > > > I wonder: > > > > (1) Is there a special cmake flag I need to pass in to enable the > EVENT > > > > marco? > > > > (2) What does the EVENT marco do and where does it output to? > > > > > > > > Thanks! > > > > Li > > > >
