Re: Using Acero in a distributed environment

2022-08-31 Thread Aldrin
I am slowly but surely building up to something like that. [1] is my progress before, using computational storage drives but I have only ran it on a single drive so far. [2] is where I will be trying to do something more generic, but using flight RPC (instead of kinetic protocol) and substrait + a

Re: Using Acero in a distributed environment

2022-08-31 Thread Jayjeet Chakraborty
Thanks a lot for your reply, Niranda and Weston. On Thu, Aug 25, 2022 at 1:31 AM Weston Pace wrote: > I don't know of any work being done to turn Acero into a distributed > query engine. > > However, I would hope that Acero can be used in a distributed query > engine, and would be a useful compo

Re: Using Acero in a distributed environment

2022-08-24 Thread Weston Pace
I don't know of any work being done to turn Acero into a distributed query engine. However, I would hope that Acero can be used in a distributed query engine, and would be a useful component. If there are features that Acero would need in this environment (e.g. some kind of exec node for speciali

Re: Using Acero in a distributed environment

2022-08-24 Thread Niranda Perera
Hi Jayeet, AFAIU, Acero work mainly focuses on single node multithreaded execution based on morsel driven parallelism [1]. In your case, there are multiple options IMO. Ex. just use 2 nodes which do filtering parallely, and then node0 does the join (this reduces communication). Better yet, if you

Using Acero in a distributed environment

2022-08-24 Thread Jayjeet Chakraborty
Hi Arrow Community, With the release of Acero, we were wondering if Acero can be used in a distributed environment as for now it looks like Acero is only intended for a local context. For example, if we have a query plan with a hash join node at the root and multiple filter project nodes on each s