Another idea would be to have a variant with const-references instead
of shared_ptr. One potential issue with our Datum is that it plays the
dual role of transporting both input and output arguments. With
outputs it's necessary to be able to convey ownership while with
inputs this is less important.

In general, I would say that some additional thought is needed about
our kernel APIs. Some scattered thoughts about this:

* Kernel "binding" (selecting a kernel given input types) is a bit ad
hoc. Many kernels do not have the ability to tell you up front what
type of data will be returned (this is very important in the context
of a query engine runtime)
* It's unclear that having a large collection of
$FUNC(FunctionContext*, Datum $arg0, ..., Datum* out) is sustainable.

Rather, it might be better to have something like:

// We know the "schema" of each argument but don't have any data yet
std::string kernel_name = "$name";
auto bound_kernel = GetKernel(kernel_name, {arg0_shape, arg1_shape}, options);
auto out = bound_kernel->Call({arg0, arg1});

So here it would be

ARROW_ASSIGN_OR_RAISE(auto take_kernel,
                      GetKernel("take", {shapes::chunked_array(int8()),
                                         shapes::chunked_array(int8())}));

So now take_kernel->shape() should tell you that the expected output
is shapes::chunked_array(int8())

And Call should preferably accept reference arguments for its input parameters.

As far as kernel-specific options, we could create a variant that
includes the different kernel-specific options structures, so that
it's not necessary to have different dispatch functions for different
kernels.

Anyway, just some out loud thoughts, but while we are in this
intermediate experimental state I'm supportive of you making the
decision that is most convenient for the immediate problem you want to
solve with the caveat that things may be refactored significantly in
the proximate future.

- Wes

On Tue, Apr 7, 2020 at 10:42 AM Uwe L. Korn <uw...@xhochy.com> wrote:
>
> I did a bit more research on JIRA and we seem to have this open topic there 
> also in https://issues.apache.org/jira/browse/ARROW-6959 which is the similar 
> topic as my mail is about and in 
> https://issues.apache.org/jira/browse/ARROW-7009 we wanted to remove some of 
> the interfaces with reference-types.
>
> On Tue, Apr 7, 2020, at 1:00 PM, Uwe L. Korn wrote:
> > Hello all,
> >
> > I'm in the progress of changing the implementation of the Take kernel
> > to work on ChunkedArrays without concatenating them into a single Array
> > first. While working on the implementation, I realised that we switch
> > often between Datum and the specific-typed parameters. This works quite
> > well for the combination of Array& and Datum(shared_ptr<ArrayData>) as
> > here the reference object with type Array& always carries a shared
> > reference with it, so switching between Array& and its Datum is quite
> > easy.
> >
> > In contrast, we cannot do this with ChunkedArrays as here the Datum
> > requires a shared_ptr<ChunkedArray> which cannot be constructed from
> > the reference type. Thus to allow interfaces like `Status
> > Take(FunctionContext* ctx, const ChunkedArray& values, const Array&
> > indices,` to pass successfully their arguments to the Kernel
> > implementation, we have to do:
> >
> > a) Remove the references from the interface of the Take() function and
> > use `shared_ptr` instances everywhere.
> > b) Add interfaces to kernels like the TakeKernel that allow calling
> > with specific references instead of Datum instances
> >
> > Personally I would prefer b) as this allow us to make more use of the
> > C++ type system and would also avoid the shared_ptr overhead where not
> > necessary.
> >
> > Cheers,
> > Uwe
> >

Reply via email to