Re: [C++] Revamping approach to Arrow compute kernel development

2020-05-25 Thread Wes McKinney
I have spent some time going through the JIRA backlog and have organized an umbrella JIRA with about 75 issues under it to help organize building out further compute kernels and kernel execution functionality: https://issues.apache.org/jira/browse/ARROW-8894 On Sun, May 24, 2020 at 9:36 AM Wes Mc

Re: [C++] Revamping approach to Arrow compute kernel development

2020-05-24 Thread Wes McKinney
I have merged the patch but left the PR open for additional code review. On Sat, May 23, 2020 at 3:24 PM Wes McKinney wrote: > > To be clear given the scope of code affected I think we should merge it today > and address further feedback in a follow up patch. I will be diligent about > respondi

Re: [C++] Revamping approach to Arrow compute kernel development

2020-05-23 Thread Wes McKinney
To be clear given the scope of code affected I think we should merge it today and address further feedback in a follow up patch. I will be diligent about responding to additional comments in the PR On Sat, May 23, 2020, 3:19 PM Wes McKinney wrote: > Yes you should still be able to comment. I wil

Re: [C++] Revamping approach to Arrow compute kernel development

2020-05-23 Thread Wes McKinney
Yes you should still be able to comment. I will reopen the PR after it is merged On Sat, May 23, 2020, 2:52 PM Micah Kornfield wrote: > Hi Wes, > Will we still be able to comment on the PR once it is closed? > > > If we want to be inclusive on feedback it might pay to wait until Tuesday > evenin

Re: [C++] Revamping approach to Arrow compute kernel development

2020-05-23 Thread Micah Kornfield
Hi Wes, Will we still be able to comment on the PR once it is closed? If we want to be inclusive on feedback it might pay to wait until Tuesday evening US time to merge since it is a long weekend here. Thanks, Micah On Saturday, May 23, 2020, Wes McKinney wrote: > Hi folks -- I've addressed a

Re: [C++] Revamping approach to Arrow compute kernel development

2020-05-23 Thread Wes McKinney
Hi folks -- I've addressed a good deal of feedback and added a lot of comments and with Kou's help have got the build passing, It would be great if this could be merged soon to unblock follow up PRs On Wed, May 20, 2020 at 11:55 PM Wes McKinney wrote: > > I just opened the PR https://github.com/a

Re: [C++] Revamping approach to Arrow compute kernel development

2020-05-20 Thread Wes McKinney
I just opened the PR https://github.com/apache/arrow/pull/7240 I'm sorry it's so big. I really think this is the best way. The only further work I plan to do on it is to get the CI passing. On Wed, May 20, 2020 at 12:26 PM Wes McKinney wrote: > > I'd guess I'm < 24 hours away from putting up my

Re: [C++] Revamping approach to Arrow compute kernel development

2020-05-20 Thread Wes McKinney
I'd guess I'm < 24 hours away from putting up my initial PR for this work. Since the work is large and (for all practical purposes) nearly impossible to separate into separately merge-ready PRs, I'll start a new e-mail thread describing what I've done in more detail and proposing a path for merging

Re: [C++] Revamping approach to Arrow compute kernel development

2020-05-11 Thread Wes McKinney
I'm working actively on this but perhaps as expected it has ballooned into a very large project -- it's unclear at the moment whether I'll be able to break the work into smaller patches that are easier to digest. I'm working as fast as I can to have an initial feature-preserving PR up, but the chan

Re: [C++] Revamping approach to Arrow compute kernel development

2020-04-22 Thread Wes McKinney
On Wed, Apr 22, 2020 at 12:41 AM Micah Kornfield wrote: > > Hi Wes, > I haven't had time to read the doc, but wanted to ask some questions on > points raised on the thread. > > * For efficiency, kernels used for array-expr evaluation should write > > into preallocated memory as their default mode.

Re: [C++] Revamping approach to Arrow compute kernel development

2020-04-21 Thread Micah Kornfield
Hi Wes, I haven't had time to read the doc, but wanted to ask some questions on points raised on the thread. * For efficiency, kernels used for array-expr evaluation should write > into preallocated memory as their default mode. This enables the > interpreter to avoid temporary memory allocations

Re: [C++] Revamping approach to Arrow compute kernel development

2020-04-21 Thread Wes McKinney
On Tue, Apr 21, 2020 at 7:32 AM Antoine Pitrou wrote: > > > Le 21/04/2020 à 13:53, Wes McKinney a écrit : > >> > >> That said, in the SortToIndices case, this wouldn't be a problem, since > >> only the second pass writes to the output. > > > > This kernel is not valid for normal array-exprs (see t

Re: [C++] Revamping approach to Arrow compute kernel development

2020-04-21 Thread Antoine Pitrou
Le 21/04/2020 à 13:53, Wes McKinney a écrit : >> >> That said, in the SortToIndices case, this wouldn't be a problem, since >> only the second pass writes to the output. > > This kernel is not valid for normal array-exprs (see the spreadsheet I > linked), such as what you can write in SQL > > K

Re: [C++] Revamping approach to Arrow compute kernel development

2020-04-21 Thread Wes McKinney
hi Antoine, On Tue, Apr 21, 2020 at 4:54 AM Antoine Pitrou wrote: > > > Le 21/04/2020 à 11:13, Antoine Pitrou a écrit : > > > It would be interesting to know how costly repeated > allocation/deallocation is. Modern allocators like jemalloc do their > own caching instead of always returning memo

Re: [C++] Revamping approach to Arrow compute kernel development

2020-04-21 Thread Wes McKinney
Hi Sven, On Mon, Apr 20, 2020 at 11:49 PM Sven Wagner-Boysen wrote: > > Hi Wes, > > I think reducing temporary memory allocation is a great effort and will > show great benefit in compute intensive scenarios. > As we are mainly working with the Rust and Datafusion part of the Arrow > project I wa

Re: [C++] Revamping approach to Arrow compute kernel development

2020-04-21 Thread Antoine Pitrou
Le 21/04/2020 à 11:13, Antoine Pitrou a écrit : > > This assumes that all these kernels can safely write into one of their > inputs. This should be true for trivial ones, but not if e.g. a kernel > makes two passes over its input. For example, the SortToIndices kernel > first scans the input f

Re: [C++] Revamping approach to Arrow compute kernel development

2020-04-21 Thread Antoine Pitrou
Hi Wes, Le 18/04/2020 à 23:41, Wes McKinney a écrit : > > There are some problems with our current collection of kernels in the > context of array-expr evaluation in query processing: > > * For efficiency, kernels used for array-expr evaluation should write > into preallocated memory as their

Re: [C++] Revamping approach to Arrow compute kernel development

2020-04-20 Thread Sven Wagner-Boysen
Hi Wes, I think reducing temporary memory allocation is a great effort and will show great benefit in compute intensive scenarios. As we are mainly working with the Rust and Datafusion part of the Arrow project I was wondering how we could best align the concepts and implementations on that level.

Re: [C++] Revamping approach to Arrow compute kernel development

2020-04-18 Thread Wes McKinney
I started a brain dump of some issues that come to mind around kernel implementation and array expression evaluation. I'll try to fill this out, and it would be helpful to add supporting citations to other projects about what kinds of issues come up and what implementation strategies may be helpful

[C++] Revamping approach to Arrow compute kernel development

2020-04-18 Thread Wes McKinney
hi folks, This e-mail comes in the context of two C++ data processing subprojects we have discussed in the past * Data Frame API https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/edit * In-memory Query Engine https://docs.google.com/document/d/10RoUZmiMQRi_J1FcPeVAUA