Thanks! This is helpful. Will take a look.

On Tue, Apr 19, 2022 at 7:00 PM Weston Pace <[email protected]> wrote:

> I can't speak for others but I do normal development with a debug
> build & UBSAN turned on.  I haven't had any problems using gdb in this
> setup.  Usually if I get a release-only bug it's because of timing or
> memory being reused more aggressively in which case I would at least
> start with ASAN and TSAN and move on to print statements from there.
> However, release-only bugs have been quite rare in my experience.
>
> If you are going to use gdb then you should read [1] as there is a
> very helpful gdb extension for debugging Arrow code.
>
> [1] https://arrow.apache.org/docs/cpp/gdb.html
>
> On Tue, Apr 19, 2022 at 8:34 AM Li Jin <[email protected]> wrote:
> >
> > I see. Thanks Weston. This is a nice tracing utils. I will give it a
> shot.
> > Although it might be more information that I actually want, I might just
> > use a print statement.
> >
> > As a side question - what do most of Arrow dev use for debugging compute
> > related code? I am new to this and tried to pdb but ended up seeing
> > incorrect data (I observed an ExecBatch with negative length in gdb, but
> > couldn't observe it when using print statements). Someone suggests that
> it
> > could be because compiling with optimization can lead to anomalous gdb
> > behavior, so I am just curious what other people do.
> >
> > Li
> >
> > On Tue, Apr 19, 2022 at 12:22 PM Weston Pace <[email protected]>
> wrote:
> >
> > > The EVENT macro is specific to open telemetry tracing.  So if `side`
> > > is only used to populate the event then I think you will need to
> > > surround the entire block with:
> > >
> > > ```
> > > #ifdef ARROW_WITH_OPENTELEMETRY
> > > int side = ...
> > > EVENT(span_, "InputReceived", {{"batch.length", batch.length}, {"side",
> > > side}});
> > > #endif
> > > ```
> > >
> > > If you want to see it in action then you can enable open telemetry by
> > > turning on ARROW_WITH_OPENTELEMETRY in the cmake options.  However, to
> > > actually get output you will need to tell OT where to send output.
> > > The simplest way to do this is to use the ARROW_TRACING_BACKEND
> > > environment variable.  You can see all the options we have at the
> > > moment in src/arrow/util/tracing_internal.cc but a simple choice is
> > > "ostream" which dumps everything to (I think) stdout.
> > >
> > > Example:
> > >
> > > ```
> > > ARROW_TRACING_BACKEND=ostream ./debug/arrow-dataset-scanner-test \
> > >
> > >
> --gtest_filter=TestScannerThreading/TestScanner.FilteredScanNested/2Threaded1d1b1024r
> > > ```
> > >
> > > Yields something like:
> > >
> > > ```
> > > {
> > >   name          : SinkNode:
> > >   trace_id      : 1e39508fe9fe74bfc1c39cfbe9b63d55
> > >   span_id       : afb1b87450748124
> > >   tracestate    :
> > >   parent_span_id: 4b46e64fb1469f90
> > >   start         : 1650385199799132739
> > >   duration      : 9878711
> > >   description   :
> > >   span kind     : Internal
> > >   status        : Unset
> > >   attributes    :
> > >     thread.id: 140287607824768
> > >     node.detail: :SinkNode{}
> > >     node.kind: SinkNode
> > >     node.label:
> > >   events        :
> > >     {
> > >       name          : InputFinished
> > >       timestamp     : 1650385199804563001
> > >       attributes    :
> > >         batches.length: 1
> > >     }
> > >     {
> > >       name          : InputReceived
> > >       timestamp     : 1650385199807960461
> > >       attributes    :
> > >         batch.length: 512
> > >     }
> > >   links         :
> > >   resources     :
> > >     service.name: unknown_service
> > >     telemetry.sdk.version: 1.3.0
> > >     telemetry.sdk.name: opentelemetry
> > >     telemetry.sdk.language: cpp
> > >   instr-lib     : arrow
> > > }
> > > ```
> > >
> > > To get more complete output from OT you will eventually want to use
> > > the http exporter and export the data to some kind of tool like Jaeger
> > > which can do visualizations of the data and offer flame charts.
> > >
> > > On Tue, Apr 19, 2022 at 5:39 AM Li Jin <[email protected]> wrote:
> > > >
> > > > Hello!
> > > >
> > > > I am trying to implement a new type of join in Arrow Compute engine
> (asof
> > > > join). I have been looking at code of HashJoinNode and found some
> debug
> > > > code that seems to be useful:
> > > >
> > > > e.g.:
> > > >     EVENT(span_, "InputReceived", {{"batch.length", batch.length},
> > > {"side",
> > > > side}});
> > > >
> > > > But when I try to use similar code in my ExecNode, I got an error:
> > > >
> > > >
> > >
> /home/icexelloss/workspace/arrow/cpp/src/arrow/compute/exec/asof_join_node.cc:67:9:
> > > > error: unused variable ‘side’ [-Werror=unused-variable]
> > > >    67 |     int side = (input == inputs_[0]) ? 0 : 1;
> > > >       |         ^~~~
> > > >
> > > > (here is my code):
> > > >   void InputReceived(ExecNode* input, ExecBatch batch) override {
> > > >     int side = (input == inputs_[0]) ? 0 : 1;
> > > >     EVENT(span_, "InputReceived", {{"batch.length", batch.length},
> > > {"side",
> > > > side}});
> > > >   }
> > > >
> > > > I wonder:
> > > > (1) Is there a special cmake flag I need to pass in to enable the
> EVENT
> > > > marco?
> > > > (2) What does the EVENT marco do and where does it output to?
> > > >
> > > > Thanks!
> > > > Li
> > >
>

Reply via email to