Re: Gandiva

Andy Grove Sun, 14 Jul 2019 17:03:16 -0700

I hadn't planned on discussing this here for another week or two, but I am
working on a new PoC right now (https://github.com/andygrove/ballista)
where I have taken the Gandiva protobuf/gRPC definition and started adding
logical query plan messages to it, to enable distributed queries (using the
Rust implementation of Arrow and the DataFusion query engine).


Although I'm coming at this from a Rust point of view, the protobuf
definition is in no way tied to Rust and I think could be used eventually
by the C++ implementation as well.

I intend to adopt the Flight protocol and IPC but for now I'm just focused
on shortest path to a working PoC and will iterate from there, and use this
project to drive requirements for Arrow/DataFusion.

Andy.


On Sun, Jul 14, 2019 at 3:59 PM Wes McKinney <wesmck...@gmail.com> wrote:

> I must apologize for my C++-centrist view of the world -- note that we
> do have a query engine project in the codebase already, in Rust
>
> https://github.com/apache/arrow/tree/master/rust/datafusion
>
> Andy has developed a SQL front end for DataFusion that I think is
> partially contained in https://github.com/andygrove/sqlparser-rs
>
> On Sun, Jul 14, 2019 at 4:56 PM Wes McKinney <wesmck...@gmail.com> wrote:
> >
> > hi Doug,
> >
> > Gandiva has a much narrower scope of functionality than what you're
> > looking for. It generates optimized computational "kernels" that can
> > be used inside of query engine runtime.
> >
> > Let's consider a SQL query like:
> >
> > SELECT YEAR(a), MONTH(b), DAY(c), log(d - 1)
> > FROM table
> > WHERE e < 5 & f > 10
> >
> > Gandiva is responsible for creating compiled functions that
> > efficiently evaluate the parts of the SELECT and WHERE block of the
> > query. The other parts of the SQL engine:
> >
> > * Materializing row batches from "table"
> > * Performing the projection in the SELECT part on each batch
> > * Filtering rows based on the WHERE predicate
> >
> > that's up to you. We are discussing developing a query engine in
> > Apache Arrow, but we need more developers in the project in all areas
> > to make faster progress
> >
> >
> https://docs.google.com/document/d/10RoUZmiMQRi_J1FcPeVAUAMJ6d_ZuiEbaM2Y33sNPu4/edit?usp=sharing
> >
> > - Wes
> >
> > On Sun, Jul 14, 2019 at 4:19 PM Doug Friedman <dfrie...@gmail.com>
> wrote:
> > >
> > > Hello-
> > >
> > > I have a feeling this question will have a painfully simple answer so I
> > > apologize in advance:
> > >
> > > I am interested in the Gandiva portion of the apache arrow project, as
> a
> > > SQL-like interface to arrow data.  I browsed the source code and I see
> a
> > > lot relating to building and evaluating expression trees, but I cannot
> find
> > > anywhere where the Parser/Lexer frontend is defined.  I'm also looking
> for
> > > a binary/executable interface where I could evaluate a string
> expression in
> > > the Gandiva language against some arrow data.
> > >
> > > Am i sorely mistaken about the purpose of this sub-project?  Or am I
> > > missing something quite obvious?
> > >
> > > Thank you, and thanks for creating this excellent project.
>

Re: Gandiva

Reply via email to