The outcome of this discussion is that I wasn't able to really make a good case for this, so am abandoning this effort. There aren't enough compelling reasons to do this and it will just cause confusion.
I appreciate having the opportunity to have the discussion. Thanks, Andy. On Mon, Jul 22, 2019 at 11:50 AM Andy Grove <andygrov...@gmail.com> wrote: > Thanks, Jacques and Wes. > > I agree that this needs discussion and a design document. I have put > together this Google doc to get the ball rolling: > > > https://docs.google.com/document/d/1Uv1FmPs7uYMLoJUH1EF0oxm-ujtz1h1tJFl0zN60TIg/edit?usp=sharing > > Thanks, > > Andy. > > On Mon, Jul 22, 2019 at 6:39 AM Wes McKinney <wesmck...@gmail.com> wrote: > >> I agree that I'd also like to see a design / goals document so clarify >> the scope (and the non-goals, too) >> >> In general, I would hesitate to add anything higher level to the >> Gandiva protos -- there is already confusion from people who believe >> that Gandiva is a "query engine" where it is actually a query engine >> subsystem (execution kernel compiler/generator). See for example the >> thread just a week ago [1] >> >> If you add higher level query plan structures to the proto file, I >> fear it will generate more confusion. If the plan ends up being to >> have a larger proto file, it would be good to move it someplace that >> isn't Gandiva-specific and clearly indicate that Gandiva is >> responsible for code generation for certain structures in the proto. >> We can also address some of these issues through better project >> documentation and READMEs. >> >> [1]: >> https://lists.apache.org/thread.html/212db05e98549f5938f3af41dade51d7a3e47255178a6c76652adc79@%3Cdev.arrow.apache.org%3E >> >> On Sun, Jul 21, 2019 at 4:23 PM Jacques Nadeau <jacq...@apache.org> >> wrote: >> > >> > Some thoughts: >> > >> > 1. I think it would make sense to start with a design >> > discussion/document about the goals and what we think is >> implementation >> > specific versus generally applicable. In general, a distributed >> execution >> > plan seems pretty implementation specific. My sense is that you'd >> never run >> > a distributed execution plan outside of the knowledge of the >> particular >> > execution environment it is running within. Part of that is usually >> > distributed execution also includes lifecycle management. For >> example, if >> > you're going to have work-stealing or early termination in your >> execution >> > engine, those are operations that stitch into execution coordination >> (and >> > thus a specific impl). If distributed execution is always engine >> specific, >> > why try to create a general one for multiple engines? >> > 2. With regards to making Gandiva protos more generic: I'd like to >> see >> > more clarity on #1. On one hand, extending things so they are reused >> is >> > good. On the other hand, the more consumers of an interface, the more >> > overloads/non-impls you have for each consumer of it. >> > >> > >> > On Sat, Jul 20, 2019 at 10:18 AM Andy Grove <andygrov...@gmail.com> >> wrote: >> > >> > > I recently created a small PoC of distributed query execution on >> Kubernetes >> > > using the Rust implementation of Apache Arrow and the DataFusion query >> > > engine [1]. >> > > >> > > This PoC uses gRPC to pass query plans to executor nodes and the >> proto file >> > > [2] is largely based on the Gandiva proto file [3]. The PoC is very >> basic >> > > but I think it demonstrates the power of having query plans as part >> of the >> > > proto file. This would allow distributed applications to be built >> based on >> > > Arrow standards in a way that is not dependent on any particular >> > > implementation of Arrow and would even allow mixing and matching query >> > > engines. >> > > >> > > I wanted to start this discussion to see what the appetite is here for >> > > accepting PRs to add query plan structures to the Gandiva proto file >> and >> > > also whether we can consider making this an Arrow proto file rather >> than >> > > being Gandiva-specific, over time. >> > > >> > > Thanks, >> > > >> > > Andy. >> > > >> > > [1] https://github.com/andygrove/ballista >> > > >> > > [2] >> > > >> > > >> https://github.com/andygrove/ballista/blob/master/proto/ballista/ballista.proto >> > > >> > > [3] >> > > >> > > >> https://github.com/apache/arrow/blob/master/cpp/src/gandiva/proto/Types.proto >> > > >> >