Wes,

That makes sense.

I'll create a fresh PR to add a new protobuf under the Rust module for now
(even though this won't be Rust specific).

Thanks,

Andy.


On Sat, Jan 5, 2019 at 9:19 AM Wes McKinney <wesmck...@gmail.com> wrote:

> hey Andy,
>
> I replied on GitHub and then saw your e-mail thread.
>
> The Gandiva library as it stands right now is not a query engine or an
> execution engine, properly speaking. It is a subgraph compiler for
> creating accelerated expressions for use inside another execution or
> query engine, like it is being used now in Dremio.
>
> For this reason I am -1 on adding logical query plan definitions to
> Gandiva until a more rigorous design effort takes place to decide
> where to build an actual query/execution engine (which includes file /
> dataset scanners, projections, joins, aggregates, filters, etc.) in
> C++. My preference is to start building a from-the-ground-up system
> that will depend on Gandiva to compile expressions during execution.
> Among other things, I don't think it is necessarily a good idea to
> require a query engine to depend on LLVM, so tight coupling to an
> LLVM-based component may not be desirable.
>
> In the meantime, if you want to start creating an (experimental)
> Protobuf / Flatbuffer definition to define a general query execution
> plan (that lives outside Gandiva for the time being) to assist with
> building a query engine in Rust, I think that is fine, but I want to
> make sure we are being deliberate and layering the project components
> in a good way
>
> - Wes
>
> On Sat, Jan 5, 2019 at 8:15 AM Andy Grove <andygrov...@gmail.com> wrote:
> >
> > I have created a PR to start a discussion around representing logical
> query
> > plans in Gandiva (ARROW-4163).
> >
> > https://github.com/apache/arrow/pull/3319
> >
> > I think that adding the various steps such as projection, selection,
> sort,
> > and so on are fairly simple and not contentious. The harder part is how
> we
> > represent data sources since this likely has different meanings to
> > different use cases. My thought is that we can register data sources by
> > name (similar to CREATE EXTERNAL TABLE in Hadoop) or tie this into the
> IPC
> > meta-data somehow so we can pass memory addresses and schema information.
> >
> > I would love to hear others thoughts on this.
> >
> > Thanks,
> >
> > Andy.
>

Reply via email to