I don't know of any work being done to turn Acero into a distributed
query engine.

However, I would hope that Acero can be used in a distributed query
engine, and would be a useful component.

If there are features that Acero would need in this environment (e.g.
some kind of exec node for specialized transmission or partitioned
transmission) then please feel free to create JIRA tickets describing
what you would like to see.

On Wed, Aug 24, 2022 at 7:18 AM Niranda Perera <niranda.per...@gmail.com> wrote:
>
> Hi Jayeet,
>
> AFAIU, Acero work mainly focuses on single node multithreaded execution
> based on morsel driven parallelism [1].
> In your case, there are multiple options IMO. Ex. just use 2 nodes which do
> filtering parallely, and then node0 does the join (this reduces
> communication).  Better yet, if you could use distributed memory
> computation for the hash join which uses both nodes (which is not supported
> yet in arrow). There are several other compute engines that support these
> types of execution on top of arrow dataformat (eg: Cylon which I'm working
> on ATM)
>
> [1] https://dl.acm.org/doi/abs/10.1145/2588555.2610507
>
> On Wed, Aug 24, 2022 at 10:00 AM Jayjeet Chakraborty <
> jayjeetchakrabort...@gmail.com> wrote:
>
> > Hi Arrow Community,
> >
> > With the release of Acero, we were wondering if Acero can be used in a
> > distributed environment as for now it looks like Acero is only intended for
> > a local context. For example, if we have a query plan with a hash join node
> > at the root and multiple filter project nodes on each sides of the tree,
> > each side having a data source, how can we distribute the query plan
> > between 3 nodes: 2 nodes containing data sources and executing the filter
> > and project parts of the query plan in parallel while 1 node serving as the
> > compute node, performing only the join operation on the results from the
> > other 2 nodes. As per my understanding, we need some form of RPC mechanism
> > between the ExecNodes of an ExecPlan and would probably be integrated
> > within the Flight framework. Is that the right way to think about it ? Do
> > you think that is something the Arrow community would be interested in if
> > not already planning for it ? Thanks.
> >
> > Jayjeet Chakraborty
> >
> >
> > --
> > *Jayjeet Chakraborty*
> > CS PhD student
> > UC Santa Cruz
> > California, USA
> >
>
>
> --
> Niranda Perera
> https://niranda.dev/
> @n1r44 <https://twitter.com/N1R44>

Reply via email to