Wes, That makes sense.
I'll create a fresh PR to add a new protobuf under the Rust module for now (even though this won't be Rust specific). Thanks, Andy. On Sat, Jan 5, 2019 at 9:19 AM Wes McKinney <wesmck...@gmail.com> wrote: > hey Andy, > > I replied on GitHub and then saw your e-mail thread. > > The Gandiva library as it stands right now is not a query engine or an > execution engine, properly speaking. It is a subgraph compiler for > creating accelerated expressions for use inside another execution or > query engine, like it is being used now in Dremio. > > For this reason I am -1 on adding logical query plan definitions to > Gandiva until a more rigorous design effort takes place to decide > where to build an actual query/execution engine (which includes file / > dataset scanners, projections, joins, aggregates, filters, etc.) in > C++. My preference is to start building a from-the-ground-up system > that will depend on Gandiva to compile expressions during execution. > Among other things, I don't think it is necessarily a good idea to > require a query engine to depend on LLVM, so tight coupling to an > LLVM-based component may not be desirable. > > In the meantime, if you want to start creating an (experimental) > Protobuf / Flatbuffer definition to define a general query execution > plan (that lives outside Gandiva for the time being) to assist with > building a query engine in Rust, I think that is fine, but I want to > make sure we are being deliberate and layering the project components > in a good way > > - Wes > > On Sat, Jan 5, 2019 at 8:15 AM Andy Grove <andygrov...@gmail.com> wrote: > > > > I have created a PR to start a discussion around representing logical > query > > plans in Gandiva (ARROW-4163). > > > > https://github.com/apache/arrow/pull/3319 > > > > I think that adding the various steps such as projection, selection, > sort, > > and so on are fairly simple and not contentious. The harder part is how > we > > represent data sources since this likely has different meanings to > > different use cases. My thought is that we can register data sources by > > name (similar to CREATE EXTERNAL TABLE in Hadoop) or tie this into the > IPC > > meta-data somehow so we can pass memory addresses and schema information. > > > > I would love to hear others thoughts on this. > > > > Thanks, > > > > Andy. >