Re: [C++] Kernel function registry evolution

2022-06-29 Thread Wes McKinney
Right, the situations where array and scalar inputs are mixed (e.g. Array + Scalar -> Array) are unaffected. The change will make more sense when you see in the PR how much code I've been able to delete. On Wed, Jun 29, 2022 at 1:03 PM Weston Pace wrote: > > This is only for the situation where A

Re: [C++] Kernel function registry evolution

2022-06-29 Thread Weston Pace
This is only for the situation where ALL inputs and outputs are scalar. Scalars, at the kernel level, do not have length. So in this case there is nothing to repeat. It does build a buffer, but just with a single value, so it is all O(1). On Wed, Jun 29, 2022 at 9:49 AM Antoine Pitrou wrote: >

Re: [C++] Kernel function registry evolution

2022-06-29 Thread Antoine Pitrou
Does boxing a scalar into an array actually build a buffer with the repeated value, or is it more efficient than that? Le 29/06/2022 à 17:57, Wes McKinney a écrit : I'm working on my next PR which addresses the "scalar output modality" and removes usages of ValueDescr and output shapes in g

Re: [C++] Kernel function registry evolution

2022-06-29 Thread Wes McKinney
I'm working on my next PR which addresses the "scalar output modality" and removes usages of ValueDescr and output shapes in general from the kernel execution machinery. The ability to provide all scalar input and scalar output of course is preserved — it's just that the "boxing" and "unboxing" of

Re: [C++] Kernel function registry evolution

2022-06-13 Thread Wes McKinney
I merged the PR a little while ago — thanks for David, Sasha for helping review. If you have more comments or things you would like to fix from that PR, please comment there and I can help address them in follow up PRs. I'm going to work next on migrating the rest of the kernels to use ExecSpan (a

Re: [C++] Kernel function registry evolution

2022-06-10 Thread Wes McKinney
PR is up: https://github.com/apache/arrow/pull/13364 Look forward to getting this in since there's a bunch of follow on work that I'd like to get started on ASAP! On Thu, Jun 9, 2022 at 7:34 AM Wes McKinney wrote: > > I'm making good progress getting my branch PR-ready -- working through > the c

Re: [C++] Kernel function registry evolution

2022-06-09 Thread Wes McKinney
I'm making good progress getting my branch PR-ready -- working through the compute-scalar-test suite and fixing the little things I broke. I hope I'll have it done by the end of the week. On Mon, Jun 6, 2022 at 3:21 PM Wes McKinney wrote: > > I created https://issues.apache.org/jira/browse/ARROW-

Re: [C++] Kernel function registry evolution

2022-06-06 Thread Wes McKinney
I created https://issues.apache.org/jira/browse/ARROW-16755 as an umbrella issue to track improvements to reduce overhead in the expression and kernel execution machinery. Please help by attaching related issues and creating new issues for specific individual efforts here. I'll work as quickly as I

Re: [C++] Kernel function registry evolution

2022-06-06 Thread Wes McKinney
This is definitely only the first stage of cleanup and streamlining — I anticipate multiple rounds of refactoring (maybe not as invasive and painful as this one), and this patch I'm not sure will do a lot to alleviate bottom line expression evaluation overhead but it creates the environment (i.e.

Re: [C++] Kernel function registry evolution

2022-06-06 Thread Antoine Pitrou
Le 06/06/2022 à 09:34, Sasha Krassovsky a écrit : Wow that's a lot of progress! Definitely agree on the scalar outputs point. One point about the ArraySpan - why does it need to know its data type? Once a kernel has been resolved by the registry, the kernel will only know how to execute on the

Re: [C++] Kernel function registry evolution

2022-06-06 Thread Sasha Krassovsky
Wow that's a lot of progress! Definitely agree on the scalar outputs point. One point about the ArraySpan - why does it need to know its data type? Once a kernel has been resolved by the registry, the kernel will only know how to execute on the specific type it was resolved for, right? If so we ca

Re: [C++] Kernel function registry evolution

2022-06-05 Thread Wes McKinney
I've made good progress on this the last couple days and I'm at the point of writing the new ScalarExecutor that uses the new ArraySpan/ExecSpan concepts, having done the initial "porting" of the scalar kernels to use the new data structures. I'm tired so I'm stopping for today, but I'm going to t

Re: [C++] Kernel function registry evolution

2022-06-03 Thread Wes McKinney
Thanks Sasha — this is helpful. I'm going to take a college try at just the scalar kernels and see what I can accomplish over the next few days — will attempt to get a PR up for review with the C++ tests passing. I'm expecting assorted workarounds for the various kernels that do zero-copy optimizat

Re: [C++] Kernel function registry evolution

2022-06-03 Thread Sasha Krassovsky
Hi all, I’ve been thinking about some sort of refactoring of this registry for a while now, and I’ve written down some thoughts, please leave your comments. https://docs.google.com/document/d/1LAN9I_Y9cZaG2a84j1wLY8jSlK3gDXYMle-VtyFCAE8/edit?usp=sharing

Re: [C++] Kernel function registry evolution

2022-06-03 Thread Weston Pace
That approach looks great and very much in line with some of the stuff we have in light_array.h so I think it's very compatible. If you have the time to push this refactoring through then go for it. Don't let anything I'm saying deter any ongoing efforts. I'm just advocating that we be open to a

Re: [C++] Kernel function registry evolution

2022-06-02 Thread Wes McKinney
On this topic, I actually have started prototyping a new ScalarKernel exec interface that uses a non-owning, shared_ptr-free "ArraySpan" data structure based on some prior conversations https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/exec.h#L163 ht

Re: [C++] Kernel function registry evolution

2022-06-02 Thread Antoine Pitrou
Le 02/06/2022 à 00:02, Weston Pace a écrit : I'd like to propose we add a second kernel function registry. There doesn't need to be any user facing API change. We could probably use an approach like [2] to proxy to the old function registry when the newer registry doesn't contain the asked-f