Re: [C++] Revamping approach to Arrow compute kernel development

Wes McKinney Tue, 21 Apr 2020 05:04:41 -0700

Hi Sven,

On Mon, Apr 20, 2020 at 11:49 PM Sven Wagner-Boysen
<sven.wagner-boy...@signavio.com> wrote:
>
> Hi Wes,
>
> I think reducing temporary memory allocation is a great effort and will
> show great benefit in compute intensive scenarios.
> As we are mainly working with the Rust and Datafusion part of the Arrow
> project I was wondering how we could best align the concepts and
> implementations on that level.
> Is the approach that the compute kernels have to be implemented for every
> language separately as well similar to the format (which is my current
> understanding) or do you see options for re-use and a common/shared
> implementation?
> Happy to get your thoughts on that.


Since we have the C data interface now, it would be relatively easy to
build a bridge for Rust to call functions written in C++ or vice
versa.

However, being a volunteer-driven project, there is only so much that
I personally have agency over, and I intend to do development in C++.
If someone wants to coordinate cross-language code reuse they are more
than welcome to do that, but I do not personally plan to at this time.

> Thanks,
> Sven
>
> On Sun, Apr 19, 2020 at 2:14 AM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > I started a brain dump of some issues that come to mind around kernel
> > implementation and array expression evaluation. I'll try to fill this
> > out, and it would be helpful to add supporting citations to other
> > projects about what kinds of issues come up and what implementation
> > strategies may be helpful. If anyone would like edit access please let
> > me know, otherwise feel free to add comments
> >
> >
> > https://docs.google.com/document/d/1LFk3WRfWGQbJ9uitWwucjiJsZMqLh8lC1vAUOscLtj8/edit?usp=sharing
> >
> > On Sat, Apr 18, 2020 at 4:41 PM Wes McKinney <wesmck...@gmail.com> wrote:
> > >
> > > hi folks,
> > >
> > > This e-mail comes in the context of two C++ data processing
> > > subprojects we have discussed in the past
> > >
> > > * Data Frame API
> > >
> > https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/edit
> > > * In-memory Query Engine
> > >
> > https://docs.google.com/document/d/10RoUZmiMQRi_J1FcPeVAUAMJ6d_ZuiEbaM2Y33sNPu4/edit
> > >
> > > One important area for both of these projects is the evaluation of
> > > array expressions that are found inside projections (i.e. "SELECT" --
> > > without any aggregate functions), filters (i.e. "WHERE"), and join
> > > clauses. In summary, these are mostly elementwise functions that do
> > > not alter the size of their input arrays. For conciseness I'll call
> > > these kinds of expressions "array-exprs"
> > >
> > > Gandiva does LLVM code generation for evaluating array-exprs, but I
> > > think it makes sense to have both codegen'd expressions as well as
> > > X100-style [1] interpreted array-expr evaluation. These modes of query
> > > evaluation ideally should share as much common implementation code as
> > > possible (taking into account Gandiva's need to compile to LLVM IR)
> > >
> > > I have reviewed the contents of our arrow/compute/kernels directory
> > > and made the following spreadsheet
> > >
> > >
> > https://docs.google.com/spreadsheets/d/1oXVy29GT1apM4e3isNpzk497Re3OqM8APvlBZCDc8B8/edit?usp=sharing
> > >
> > > There are some problems with our current collection of kernels in the
> > > context of array-expr evaluation in query processing:
> > >
> > > * For efficiency, kernels used for array-expr evaluation should write
> > > into preallocated memory as their default mode. This enables the
> > > interpreter to avoid temporary memory allocations and improve CPU
> > > cache utilization. Almost none of our kernels are implemented this way
> > > currently.
> > > * The current approach for expr-type kernels of having a top-level
> > > memory-allocating function is not scalable for binding developers. I
> > > believe instead that kernels should be selected and invoked
> > > generically by using the string name of the kernel
> > >
> > > On this last point, what I am suggesting is that we do something more
> > like
> > >
> > > ASSIGN_OR_RAISE(auto kernel, compute::GetKernel("greater", {type0,
> > type1}));
> > > ArrayData* out = ... ;
> > > RETURN_NOT_OK(kernel->Call({arg0, arg1}, &out));
> > >
> > > In particular, when we reason that successive kernel invocations can
> > > reuse memory, we can have code that is doing in essence
> > >
> > > k1->Call({arg0, arg1}, &out)
> > > k2->Call({out}, &out))
> > > k3->Call({arg2, out}, &out)
> > >
> > > Of course, if you want the kernel to allocate memory then you can do
> > > something like
> > >
> > > ASSIGN_OR_RAISE(out = kernel->Call({arg0, arg1}));
> > >
> > > And if you want to avoid the "GetKernel" business you should be able to
> > do
> > >
> > > ASSIGN_OR_RAISE(auto result, ExecKernel(name, {arg0, arg1}));
> > >
> > > I think this "ExecKernel" function should likely replace the public
> > > APIs for running specific kernels that we currently have.
> > >
> > > One detail that isn't addressed above is what to do with
> > > kernel-specific configuration options. One way to address that is to
> > > have a common base type for all options so that we can do
> > >
> > > struct MyKernelOptions : public KernelOptions {
> > >   ...
> > > };
> > >
> > > and then
> > >
> > > MyKernelOptions options = ...;
> > > out = ExecKernel(name, args, options);
> > >
> > > Maybe there are some other ideas.
> > >
> > > At some point we will need to have an implementation blitz where we go
> > > from our current 20 or so non-codegen'd kernels for array-exprs to
> > > several hundred, so I wanted to discuss these issues so we can all get
> > > on the same page. I'd like to take a crack at an initial iteration of
> > > the above API proposal with a centralized kernel registry (that will
> > > also need to support having UDFs) so we can begin porting the existing
> > > array-expr kernels to use the new API and then have a clear path
> > > forward for expanding our set of supported functions (which will also
> > > ideally involve sharing implementation code with Gandiva, particularly
> > > its "precompiled" directory)
> > >
> > > Thanks,
> > > Wes
> > >
> > > [1]: http://cidrdb.org/cidr2005/papers/P19.pdf
> >
>
>
> --
>
> *Sven Wagner-Boysen* | Lead Software Engineer
>
> T: +49 160 930 70 6 70 | www.signavio.com
> Kurfürstenstraße 111, 10787 Berlin, Germany
>
> [image: https://www.signavio.com/] <https://www.signavio.com/>
> <https://twitter.com/signavio/>   <https://www.facebook.com/signavio/>
> <https://www.linkedin.com/company/signavio/>
> <https://www.xing.com/company/signavio>
> <https://www.youtube.com/user/signavio>
>
> <https://www.signavio.com/live/?utm_campaign=Signavio%20Live%202020&utm_source=email&utm_medium=signature>
> Check my LinkedIn Profile
> <https://www.linkedin.com/in/sven-wagner-boysen-84231046/>
>
> *Review us:*
> Gartner Peer Insights
> <https://www.gartner.com/reviews/market/enterprise-business-process-analysis/vendor/Signavio/product/signavio-process-manager>
>
> HRB 121584 B Charlottenburg District Court, VAT ID: DE265675123
> Managing Directors: Dr. Gero Decker, Daniel Rosenthal

Re: [C++] Revamping approach to Arrow compute kernel development

Reply via email to