Yes, what you’ve said is correct for Mean. But my point earlier is that there
should only be a few of such special cases. A simple case would be e.g. Max,
where Aggregate outputs Max and then merge outputs Max(Max).
Sasha
> 6 июля 2023 г., в 23:13, Jiangtao Peng написал(а):
>
>
> Sorry fo
Sorry for my unclear expression.
Take mean aggregation as example, does Aggregate "output" sum and count value,
and Accumulate will "input" sum and count value, then "merge"
sum(sum)/sum(count) as "output"?
My point is how to implement Pre-Aggregation and Post-Aggregation using Acero.
Best,
Jia
Can you clarify what you mean by “data flow”? Each machine will be executing
the same query plan. The query plan will contain an operator called Shuffle,
and above the Shuffle will be an Aggregate, and above that will be an
Accumulate node. The SourceNode will read data from disk on each machine
Sure, distributed aggregations can be split into “compute” and “merge” two
phase. But how about data flow of “compute” and “merge” on different nodes?
Best,
Jiangtao
From: Sasha Krassovsky
Date: Friday, July 7, 2023 at 11:07 AM
To: user@arrow.apache.org
Subject: Re: [C++][Acero] can Acero sup
Distributed aggregations have two phases: “compute” (each node does its own
aggregation) and “merge” (the master merged the partial results). For most
aggregates (e.g. Sum, Min, Max), merge is just the same as compute, except over
the partial results. However, for Mean in particular, the compute
Hi Sasha,
Thanks for your reply!
Maybe Shuffle node is enough for data distribution. How about aggregation node?
For example, mean kernel with group will maintain sum and count value for each
group. Mater node merging mean result needs sum and count value on each
partition. But mean kernel see
Hi Jiangtao,
Acero doesn’t support any distributed computation on its own. However, to get
some simple distributed computation going it would be sufficient to add a
Shuffle node. For example for Aggregation, the Shuffle would assign a range of
hashes to each node, and then each node would hash-p
Hi there,
I'm learning Acero streaming execution engine recently. And I’m wondering if
Acero support distributed computing.
I have read code about aggregation node and kernel; Aggregation kernel seems to
hide the details of aggregation middle state. If use multiple nodes with Acero
execution e
Hi Gus, did you ever get an answer to your questions?
>From a look at the source code, neither the CSV reader or builders
look goroutine safe. However, your usage of the CSV reader above looks
safe to me because 'record' gets copied into each goroutine
invocation. Importantly, the builder would ne