There are also some good conversations happening in on the Github discussion forum [1].
We're trying to drive consensus around the type system to start [2]. Would love the Iceberg community members to weigh in as the contributors are fairly Arrow heavy atm. Thanks, Jacques [1] https://github.com/substrait-io/substrait/discussions [2] https://github.com/substrait-io/substrait/discussions/2 On Fri, Sep 10, 2021 at 9:19 AM Ryan Blue <b...@tabular.io> wrote: > Nevermind, I see there's a Substrait Slack community. Here's the invite > link for anyone else that's interested: > https://join.slack.com/t/substrait/shared_invite/zt-vivbux2c-~B1jEWcR0wYhq5k4LHuoLQ > > On Fri, Sep 10, 2021 at 9:16 AM Ryan Blue <b...@tabular.io> wrote: > >> Thanks, Jacques! I think it's a great idea to have this as an external >> project so that it doesn't get tied to a particular set of goals for an >> existing project. >> >> Where is a good place to discuss this? Should we create a #substrait room >> on Iceberg Slack? ASF Slack? On this thread? >> >> Ryan >> >> On Wed, Sep 8, 2021 at 8:21 AM Jacques Nadeau <jacquesnad...@gmail.com> >> wrote: >> >>> Hey all, >>> >>> For some time I've been thinking that having a common serialized >>> representation of query plans would be helpful across multiple related >>> projects. I started working on something independently in this vein several >>> months ago. Since then, Arrow has started exploring "Arrow IR" and in >>> Iceberg, Piotr and others were proposing something similar to support a >>> cross-engine structured view. Given the different veins of interest, I >>> think we should combine forces on a consolidated consensus-driven solution. >>> >>> As I've had more conversations with different people, I've come to the >>> conclusion that given the complexity of the task and people's >>> competing priorities, a separate "Switzerland" project is the best way to >>> find common ground. As such, I've started to sketch out a specification [1] >>> called Substrait. I'd love to collaborate with the Iceberg community to >>> ensure the specification does a good job of supporting the needs of this >>> project. >>> >>> For those that are interested, please join Slack and/or start a >>> discussion on GitHub. My first goal is to come to consensus on the type >>> system of simple [2], compound [3] and physical [4] types. The general >>> approach I'm trying to follow is: >>> >>> - Use Spark, Trino, Arrow and Iceberg as the four indicators of >>> whether something should be first class. It must exist in at least two >>> systems to be formalized. >>> - Avoid a formal distinction between logical and physical (types, >>> operators, etc) >>> - Lean more towards simple types than compound types when systems >>> generally use only a constrained set of parameters (e.g. timestamp(3) and >>> timestamp(6) as opposed to timestamp(x)). >>> >>> >>> Links for Substrait: >>> Site: https://substrait.io >>> Spec source: >>> https://github.com/substrait-io/substrait/tree/main/site/docs >>> Binary format: >>> https://github.com/substrait-io/substrait/tree/main/binary >>> >>> Please let me know your thoughts, >>> Jacques >>> >>> [1] https://substrait.io/spec/specification/#components >>> [2] https://substrait.io/types/simple_logical_types/ >>> [3] https://substrait.io/types/compound_logical_types/ >>> [4] https://substrait.io/types/physical_types/ >>> >>> >> >> -- >> Ryan Blue >> Tabular >> > > > -- > Ryan Blue > Tabular >