I can't speak for others but I do normal development with a debug
build & UBSAN turned on.  I haven't had any problems using gdb in this
setup.  Usually if I get a release-only bug it's because of timing or
memory being reused more aggressively in which case I would at least
start with ASAN and TSAN and move on to print statements from there.
However, release-only bugs have been quite rare in my experience.

If you are going to use gdb then you should read [1] as there is a
very helpful gdb extension for debugging Arrow code.

[1] https://arrow.apache.org/docs/cpp/gdb.html

On Tue, Apr 19, 2022 at 8:34 AM Li Jin <ice.xell...@gmail.com> wrote:
>
> I see. Thanks Weston. This is a nice tracing utils. I will give it a shot.
> Although it might be more information that I actually want, I might just
> use a print statement.
>
> As a side question - what do most of Arrow dev use for debugging compute
> related code? I am new to this and tried to pdb but ended up seeing
> incorrect data (I observed an ExecBatch with negative length in gdb, but
> couldn't observe it when using print statements). Someone suggests that it
> could be because compiling with optimization can lead to anomalous gdb
> behavior, so I am just curious what other people do.
>
> Li
>
> On Tue, Apr 19, 2022 at 12:22 PM Weston Pace <weston.p...@gmail.com> wrote:
>
> > The EVENT macro is specific to open telemetry tracing.  So if `side`
> > is only used to populate the event then I think you will need to
> > surround the entire block with:
> >
> > ```
> > #ifdef ARROW_WITH_OPENTELEMETRY
> > int side = ...
> > EVENT(span_, "InputReceived", {{"batch.length", batch.length}, {"side",
> > side}});
> > #endif
> > ```
> >
> > If you want to see it in action then you can enable open telemetry by
> > turning on ARROW_WITH_OPENTELEMETRY in the cmake options.  However, to
> > actually get output you will need to tell OT where to send output.
> > The simplest way to do this is to use the ARROW_TRACING_BACKEND
> > environment variable.  You can see all the options we have at the
> > moment in src/arrow/util/tracing_internal.cc but a simple choice is
> > "ostream" which dumps everything to (I think) stdout.
> >
> > Example:
> >
> > ```
> > ARROW_TRACING_BACKEND=ostream ./debug/arrow-dataset-scanner-test \
> >
> > --gtest_filter=TestScannerThreading/TestScanner.FilteredScanNested/2Threaded1d1b1024r
> > ```
> >
> > Yields something like:
> >
> > ```
> > {
> >   name          : SinkNode:
> >   trace_id      : 1e39508fe9fe74bfc1c39cfbe9b63d55
> >   span_id       : afb1b87450748124
> >   tracestate    :
> >   parent_span_id: 4b46e64fb1469f90
> >   start         : 1650385199799132739
> >   duration      : 9878711
> >   description   :
> >   span kind     : Internal
> >   status        : Unset
> >   attributes    :
> >     thread.id: 140287607824768
> >     node.detail: :SinkNode{}
> >     node.kind: SinkNode
> >     node.label:
> >   events        :
> >     {
> >       name          : InputFinished
> >       timestamp     : 1650385199804563001
> >       attributes    :
> >         batches.length: 1
> >     }
> >     {
> >       name          : InputReceived
> >       timestamp     : 1650385199807960461
> >       attributes    :
> >         batch.length: 512
> >     }
> >   links         :
> >   resources     :
> >     service.name: unknown_service
> >     telemetry.sdk.version: 1.3.0
> >     telemetry.sdk.name: opentelemetry
> >     telemetry.sdk.language: cpp
> >   instr-lib     : arrow
> > }
> > ```
> >
> > To get more complete output from OT you will eventually want to use
> > the http exporter and export the data to some kind of tool like Jaeger
> > which can do visualizations of the data and offer flame charts.
> >
> > On Tue, Apr 19, 2022 at 5:39 AM Li Jin <ice.xell...@gmail.com> wrote:
> > >
> > > Hello!
> > >
> > > I am trying to implement a new type of join in Arrow Compute engine (asof
> > > join). I have been looking at code of HashJoinNode and found some debug
> > > code that seems to be useful:
> > >
> > > e.g.:
> > >     EVENT(span_, "InputReceived", {{"batch.length", batch.length},
> > {"side",
> > > side}});
> > >
> > > But when I try to use similar code in my ExecNode, I got an error:
> > >
> > >
> > /home/icexelloss/workspace/arrow/cpp/src/arrow/compute/exec/asof_join_node.cc:67:9:
> > > error: unused variable ‘side’ [-Werror=unused-variable]
> > >    67 |     int side = (input == inputs_[0]) ? 0 : 1;
> > >       |         ^~~~
> > >
> > > (here is my code):
> > >   void InputReceived(ExecNode* input, ExecBatch batch) override {
> > >     int side = (input == inputs_[0]) ? 0 : 1;
> > >     EVENT(span_, "InputReceived", {{"batch.length", batch.length},
> > {"side",
> > > side}});
> > >   }
> > >
> > > I wonder:
> > > (1) Is there a special cmake flag I need to pass in to enable the EVENT
> > > marco?
> > > (2) What does the EVENT marco do and where does it output to?
> > >
> > > Thanks!
> > > Li
> >

Reply via email to