Thanks, Jacques! I think it's a great idea to have this as an external project so that it doesn't get tied to a particular set of goals for an existing project.
Where is a good place to discuss this? Should we create a #substrait room on Iceberg Slack? ASF Slack? On this thread? Ryan On Wed, Sep 8, 2021 at 8:21 AM Jacques Nadeau <jacquesnad...@gmail.com> wrote: > Hey all, > > For some time I've been thinking that having a common serialized > representation of query plans would be helpful across multiple related > projects. I started working on something independently in this vein several > months ago. Since then, Arrow has started exploring "Arrow IR" and in > Iceberg, Piotr and others were proposing something similar to support a > cross-engine structured view. Given the different veins of interest, I > think we should combine forces on a consolidated consensus-driven solution. > > As I've had more conversations with different people, I've come to the > conclusion that given the complexity of the task and people's > competing priorities, a separate "Switzerland" project is the best way to > find common ground. As such, I've started to sketch out a specification [1] > called Substrait. I'd love to collaborate with the Iceberg community to > ensure the specification does a good job of supporting the needs of this > project. > > For those that are interested, please join Slack and/or start a discussion > on GitHub. My first goal is to come to consensus on the type system of > simple [2], compound [3] and physical [4] types. The general approach I'm > trying to follow is: > > - Use Spark, Trino, Arrow and Iceberg as the four indicators of > whether something should be first class. It must exist in at least two > systems to be formalized. > - Avoid a formal distinction between logical and physical (types, > operators, etc) > - Lean more towards simple types than compound types when systems > generally use only a constrained set of parameters (e.g. timestamp(3) and > timestamp(6) as opposed to timestamp(x)). > > > Links for Substrait: > Site: https://substrait.io > Spec source: https://github.com/substrait-io/substrait/tree/main/site/docs > Binary format: https://github.com/substrait-io/substrait/tree/main/binary > > Please let me know your thoughts, > Jacques > > [1] https://substrait.io/spec/specification/#components > [2] https://substrait.io/types/simple_logical_types/ > [3] https://substrait.io/types/compound_logical_types/ > [4] https://substrait.io/types/physical_types/ > > -- Ryan Blue Tabular