Re: [DISCUSS] Policies for Substrait extensions

2022-04-19 Thread Jeroen van Straten
> At the moment there is a version at [2] which I will propose be the > official implementation for the Apache Arrow project (although it > needs a tiny bit of cleanup to remove a comment reference to C++). > Assuming the discussion doesn't raise any significant concerns in the > next week or so I'

[RESULT] [VOTE][RUST] Release Apache Arrow Rust 12.0.0 RC1

2022-04-19 Thread Andrew Lamb
The vote passes with 10 +1 votes (3 binding) Thanks as always for everyone who verified and contributed to this release! The release is available here: https://dist.apache.org/repos/dist/release/arrow/arrow-rs-12.0.0 It has also been uploaded to crates.io: https://crates.io/crates/arrow/12.0.0

[C++] [Compute] Question on "EVENT" macro in ExecNode

2022-04-19 Thread Li Jin
Hello! I am trying to implement a new type of join in Arrow Compute engine (asof join). I have been looking at code of HashJoinNode and found some debug code that seems to be useful: e.g.: EVENT(span_, "InputReceived", {{"batch.length", batch.length}, {"side", side}}); But when I try to use

[C++] output field names in Arrow Substrait

2022-04-19 Thread Yaron Gvili
Hi, We ran into an issue due to the fact that, for intermediate relations, Substrait does not automatically compute output field names nor allows one to explicitly name output fields [1]. This leads to trouble when one needs to refer to these output fields by name [2]. We run into this trouble

Re: [C++] [Compute] Question on "EVENT" macro in ExecNode

2022-04-19 Thread Weston Pace
The EVENT macro is specific to open telemetry tracing. So if `side` is only used to populate the event then I think you will need to surround the entire block with: ``` #ifdef ARROW_WITH_OPENTELEMETRY int side = ... EVENT(span_, "InputReceived", {{"batch.length", batch.length}, {"side", side}});

Re: [C++] output field names in Arrow Substrait

2022-04-19 Thread Weston Pace
Hi Yaron, I think you might have forgotten the links for [1][2][3] so I'm not entirely sure of the context. Are you going from Substrait to an Arrow execution plan? Or are you going from an Arrow execution plan to Substrait? For Substrait -> Arrow most of our execution nodes should take in a Fie

Re: [C++] [Compute] Question on "EVENT" macro in ExecNode

2022-04-19 Thread Li Jin
I see. Thanks Weston. This is a nice tracing utils. I will give it a shot. Although it might be more information that I actually want, I might just use a print statement. As a side question - what do most of Arrow dev use for debugging compute related code? I am new to this and tried to pdb but en

Re: [C++] output field names in Arrow Substrait

2022-04-19 Thread Yaron Gvili
Hi Weston, Thanks for the quick response. I think you might have forgotten the links for [1][2][3] Sorry about the confusion; I use these not as references to links but as markers of points I make in the beginning that I elaborate on later, in the places where I reuse the markers. Are you going

Re: [C++] [Compute] Question on "EVENT" macro in ExecNode

2022-04-19 Thread Weston Pace
I can't speak for others but I do normal development with a debug build & UBSAN turned on. I haven't had any problems using gdb in this setup. Usually if I get a release-only bug it's because of timing or memory being reused more aggressively in which case I would at least start with ASAN and TSA

Re: [C++] output field names in Arrow Substrait

2022-04-19 Thread Weston Pace
> However, the problem is there are natural cases in > which an execution node should or must take in a string-name If we can come up with such a case then I agree it would be a problem for Substrait's current definition. I don't think we can come up with such a case. Every column that can be re

Re: [C++] [Compute] Question on "EVENT" macro in ExecNode

2022-04-19 Thread Li Jin
Thanks! This is helpful. Will take a look. On Tue, Apr 19, 2022 at 7:00 PM Weston Pace wrote: > I can't speak for others but I do normal development with a debug > build & UBSAN turned on. I haven't had any problems using gdb in this > setup. Usually if I get a release-only bug it's because of