The outcome of this discussion is that I wasn't able to really make a good
case for this, so am abandoning this effort. There aren't enough compelling
reasons to do this and it will just cause confusion.

I appreciate having the opportunity to have the discussion.

Thanks,

Andy.


On Mon, Jul 22, 2019 at 11:50 AM Andy Grove <andygrov...@gmail.com> wrote:

> Thanks, Jacques and Wes.
>
> I agree that this needs discussion and a design document. I have put
> together this Google doc to get the ball rolling:
>
>
> https://docs.google.com/document/d/1Uv1FmPs7uYMLoJUH1EF0oxm-ujtz1h1tJFl0zN60TIg/edit?usp=sharing
>
> Thanks,
>
> Andy.
>
> On Mon, Jul 22, 2019 at 6:39 AM Wes McKinney <wesmck...@gmail.com> wrote:
>
>> I agree that I'd also like to see a design / goals document so clarify
>> the scope (and the non-goals, too)
>>
>> In general, I would hesitate to add anything higher level to the
>> Gandiva protos -- there is already confusion from people who believe
>> that Gandiva is a "query engine" where it is actually a query engine
>> subsystem (execution kernel compiler/generator). See for example the
>> thread just a week ago [1]
>>
>> If you add higher level query plan structures to the proto file, I
>> fear it will generate more confusion. If the plan ends up being to
>> have a larger proto file, it would be good to move it someplace that
>> isn't Gandiva-specific and clearly indicate that Gandiva is
>> responsible for code generation for certain structures in the proto.
>> We can also address some of these issues through better project
>> documentation and READMEs.
>>
>> [1]:
>> https://lists.apache.org/thread.html/212db05e98549f5938f3af41dade51d7a3e47255178a6c76652adc79@%3Cdev.arrow.apache.org%3E
>>
>> On Sun, Jul 21, 2019 at 4:23 PM Jacques Nadeau <jacq...@apache.org>
>> wrote:
>> >
>> > Some thoughts:
>> >
>> >    1. I think it would make sense to start with a design
>> >    discussion/document about the goals and what we think is
>> implementation
>> >    specific versus generally applicable. In general, a distributed
>> execution
>> >    plan seems pretty implementation specific. My sense is that you'd
>> never run
>> >    a distributed execution plan outside of the knowledge of the
>> particular
>> >    execution environment it is running within. Part of that is usually
>> >    distributed execution also includes lifecycle management. For
>> example, if
>> >    you're going to have work-stealing  or early termination in your
>> execution
>> >    engine, those are operations that stitch into execution coordination
>> (and
>> >    thus a specific impl). If distributed execution is always engine
>> specific,
>> >    why try to create a general one for multiple engines?
>> >    2. With regards to making Gandiva protos more generic: I'd like to
>> see
>> >    more clarity on #1. On one hand, extending things so they are reused
>> is
>> >    good. On the other hand, the more consumers of an interface, the more
>> >    overloads/non-impls you have for each consumer of it.
>> >
>> >
>> > On Sat, Jul 20, 2019 at 10:18 AM Andy Grove <andygrov...@gmail.com>
>> wrote:
>> >
>> > > I recently created a small PoC of distributed query execution on
>> Kubernetes
>> > > using the Rust implementation of Apache Arrow and the DataFusion query
>> > > engine [1].
>> > >
>> > > This PoC uses gRPC to pass query plans to executor nodes and the
>> proto file
>> > > [2] is largely based on the Gandiva proto file [3]. The PoC is very
>> basic
>> > > but I think it demonstrates the power of having query plans as part
>> of the
>> > > proto file. This would allow distributed applications to be built
>> based on
>> > > Arrow standards in a way that is not dependent on any particular
>> > > implementation of Arrow and would even allow mixing and matching query
>> > > engines.
>> > >
>> > > I wanted to start this discussion to see what the appetite is here for
>> > > accepting PRs to add query plan structures to the Gandiva proto file
>> and
>> > > also whether we can consider making this an Arrow proto file rather
>> than
>> > > being Gandiva-specific, over time.
>> > >
>> > > Thanks,
>> > >
>> > > Andy.
>> > >
>> > > [1] https://github.com/andygrove/ballista
>> > >
>> > > [2]
>> > >
>> > >
>> https://github.com/andygrove/ballista/blob/master/proto/ballista/ballista.proto
>> > >
>> > > [3]
>> > >
>> > >
>> https://github.com/apache/arrow/blob/master/cpp/src/gandiva/proto/Types.proto
>> > >
>>
>

Reply via email to