Another idea would be to have a variant with const-references instead of shared_ptr. One potential issue with our Datum is that it plays the dual role of transporting both input and output arguments. With outputs it's necessary to be able to convey ownership while with inputs this is less important.
In general, I would say that some additional thought is needed about our kernel APIs. Some scattered thoughts about this: * Kernel "binding" (selecting a kernel given input types) is a bit ad hoc. Many kernels do not have the ability to tell you up front what type of data will be returned (this is very important in the context of a query engine runtime) * It's unclear that having a large collection of $FUNC(FunctionContext*, Datum $arg0, ..., Datum* out) is sustainable. Rather, it might be better to have something like: // We know the "schema" of each argument but don't have any data yet std::string kernel_name = "$name"; auto bound_kernel = GetKernel(kernel_name, {arg0_shape, arg1_shape}, options); auto out = bound_kernel->Call({arg0, arg1}); So here it would be ARROW_ASSIGN_OR_RAISE(auto take_kernel, GetKernel("take", {shapes::chunked_array(int8()), shapes::chunked_array(int8())})); So now take_kernel->shape() should tell you that the expected output is shapes::chunked_array(int8()) And Call should preferably accept reference arguments for its input parameters. As far as kernel-specific options, we could create a variant that includes the different kernel-specific options structures, so that it's not necessary to have different dispatch functions for different kernels. Anyway, just some out loud thoughts, but while we are in this intermediate experimental state I'm supportive of you making the decision that is most convenient for the immediate problem you want to solve with the caveat that things may be refactored significantly in the proximate future. - Wes On Tue, Apr 7, 2020 at 10:42 AM Uwe L. Korn <uw...@xhochy.com> wrote: > > I did a bit more research on JIRA and we seem to have this open topic there > also in https://issues.apache.org/jira/browse/ARROW-6959 which is the similar > topic as my mail is about and in > https://issues.apache.org/jira/browse/ARROW-7009 we wanted to remove some of > the interfaces with reference-types. > > On Tue, Apr 7, 2020, at 1:00 PM, Uwe L. Korn wrote: > > Hello all, > > > > I'm in the progress of changing the implementation of the Take kernel > > to work on ChunkedArrays without concatenating them into a single Array > > first. While working on the implementation, I realised that we switch > > often between Datum and the specific-typed parameters. This works quite > > well for the combination of Array& and Datum(shared_ptr<ArrayData>) as > > here the reference object with type Array& always carries a shared > > reference with it, so switching between Array& and its Datum is quite > > easy. > > > > In contrast, we cannot do this with ChunkedArrays as here the Datum > > requires a shared_ptr<ChunkedArray> which cannot be constructed from > > the reference type. Thus to allow interfaces like `Status > > Take(FunctionContext* ctx, const ChunkedArray& values, const Array& > > indices,` to pass successfully their arguments to the Kernel > > implementation, we have to do: > > > > a) Remove the references from the interface of the Take() function and > > use `shared_ptr` instances everywhere. > > b) Add interfaces to kernels like the TakeKernel that allow calling > > with specific references instead of Datum instances > > > > Personally I would prefer b) as this allow us to make more use of the > > C++ type system and would also avoid the shared_ptr overhead where not > > necessary. > > > > Cheers, > > Uwe > >