On #2, I think this discussion might be overly speculative until a
full-fledged multithreaded hash aggregation is implemented in the
Apache Arrow C++ library. There are other analytic database systems in
the wild which might provide a blueprint for the way that we should
approach this, and I don't
1: Excellent!
2: The open JIRA for grouped aggregation is
https://issues.apache.org/jira/browse/ARROW-4124
(though it's out of date since it predates the addition of
ScalarAggregateKernel).
To summarize: for *grouped* aggregation we want the kernel to do the work
of evaluating group
condition(s) a
1. This is great. I will follow this JIRA. (better yet, I'll see if I can
make that contribution)
2. If we forget about the multithreading case for a moment, this
requirement came up while implementing a "groupby + aggregation" operation
(single-threaded). Let's assume that a table is not sorted.
Hi Niranda,
hastebin: That looks generally correct, though I should warn you that a
recent PR
( https://github.com/apache/arrow/pull/8574 ) changed the return type of
DispatchExact to Kernel so you'll need to insert an explicit cast to
ScalarAggregateKernel.
1: This seems like a feature which mig
Hi Ben and Wes,
Based on our discussion, I did the following.
https://hastebin.com/ajadonados.cpp
It seems to be working fine. Would love to get your feedback on this! :-)
But I have a couple of concerns.
1. Say I want to communicate the intermediate state data across multiple
processes. Unfortun
Yes, open a Jira and propose a PR implementing the changes you need
On Mon, Nov 9, 2020 at 8:31 PM Niranda Perera wrote:
>
> @wes How should I proceed with this nevertheless? should I open a JIRA?
>
> On Mon, Nov 9, 2020 at 11:09 AM Wes McKinney wrote:
>
> > On Mon, Nov 9, 2020 at 9:32 AM Nirand
@wes How should I proceed with this nevertheless? should I open a JIRA?
On Mon, Nov 9, 2020 at 11:09 AM Wes McKinney wrote:
> On Mon, Nov 9, 2020 at 9:32 AM Niranda Perera
> wrote:
> >
> > @Ben
> > Thank you very much for the feedback. But unfortunately, I was unable to
> > find a header that e
I see. So, IIUC, I would have to get the "mean" Function from the function
registry, cast it to a ScalarAggregateFunction, and access the kernels_
vector which ultimately holds the ScalarAggregateKernel? :-)
Or can I use this ScalarAggregateFunction::DispatchExact function for this?
On Mon, Nov 9
Hi Niranda,
> unfortunately, I was unable to find a header that exposes a
SumAggregateKernel
Sorry, I misspelled that! Should have written ScalarAggregateKernel, which
is in kernel.h
On Mon, Nov 9, 2020 at 11:09 AM Wes McKinney wrote:
> On Mon, Nov 9, 2020 at 9:32 AM Niranda Perera
> wrote:
>
On Mon, Nov 9, 2020 at 9:32 AM Niranda Perera wrote:
>
> @Ben
> Thank you very much for the feedback. But unfortunately, I was unable to
> find a header that exposes a SumAggregateKernel in the v2.0.0. Maybe I am
> checking it wrong. I remember accessing them in v0.16 IINM.
>
> @Wes
> Yes, that wo
@Ben
Thank you very much for the feedback. But unfortunately, I was unable to
find a header that exposes a SumAggregateKernel in the v2.0.0. Maybe I am
checking it wrong. I remember accessing them in v0.16 IINM.
@Wes
Yes, that would be great. How about adding a CMake compilation flag for
such dev
I'm not opposed to installing headers that provide access to some of
the kernel implementation internals (with the caveat that changes
won't go through a deprecation cycle, so caveat emptor). It might be
more sustainable to think about what kind of stable-ish public API
could be exported to support
Hi Niranda,
SumImpl is a subclass of KernelState. Given a SumAggregateKernel, one can
produce zeroed KernelState using the `init` member, then operate on data
using the `consume`, `merge`, and `finalize` members. You can look at
ScalarAggExecutor for an example of how to get from a compute functio
Hi Ben,
We are building a distributed table abstraction on top of Arrow dataframes
called Cylon (https://github.com/cylondata/cylon). Currently we have a
simple aggregation and group-by operation implementation. But we felt like
we can give more functionality if we can import arrow kernels and sta
Ni Niranda,
What is the context of your work? if you're working inside the arrow
repository you shouldn't need to install headers before using them, and we
welcome PRs for new kernels. Otherwise, could you provide some details
about how your work is using Arrow as a dependency?
Ben Kietzman
On S
15 matches
Mail list logo