Thanks, Jacques! I think it's a great idea to have this as an external
project so that it doesn't get tied to a particular set of goals for an
existing project.

Where is a good place to discuss this? Should we create a #substrait room
on Iceberg Slack? ASF Slack? On this thread?

Ryan

On Wed, Sep 8, 2021 at 8:21 AM Jacques Nadeau <jacquesnad...@gmail.com>
wrote:

> Hey all,
>
> For some time I've been thinking that having a common serialized
> representation of query plans would be helpful across multiple related
> projects. I started working on something independently in this vein several
> months ago. Since then, Arrow has started exploring "Arrow IR" and in
> Iceberg, Piotr and others were proposing something similar to support a
> cross-engine structured view. Given the different veins of interest, I
> think we should combine forces on a consolidated consensus-driven solution.
>
> As I've had more conversations with different people, I've come to the
> conclusion that given the complexity of the task and people's
> competing priorities, a separate "Switzerland" project is the best way to
> find common ground. As such, I've started to sketch out a specification [1]
> called Substrait. I'd love to collaborate with the Iceberg community to
> ensure the specification does a good job of supporting the needs of this
> project.
>
> For those that are interested, please join Slack and/or start a discussion
> on GitHub. My first goal is to come to consensus on the type system of
> simple [2], compound [3] and physical [4] types. The general approach I'm
> trying to follow is:
>
>    - Use Spark, Trino, Arrow and Iceberg as the four indicators of
>    whether something should be first class. It must exist in at least two
>    systems to be formalized.
>    - Avoid a formal distinction between logical and physical (types,
>    operators, etc)
>    - Lean more towards simple types than compound types when systems
>    generally use only a constrained set of parameters (e.g. timestamp(3) and
>    timestamp(6) as opposed to timestamp(x)).
>
>
> Links for Substrait:
> Site: https://substrait.io
> Spec source: https://github.com/substrait-io/substrait/tree/main/site/docs
> Binary format: https://github.com/substrait-io/substrait/tree/main/binary
>
> Please let me know your thoughts,
> Jacques
>
> [1] https://substrait.io/spec/specification/#components
> [2] https://substrait.io/types/simple_logical_types/
> [3] https://substrait.io/types/compound_logical_types/
> [4] https://substrait.io/types/physical_types/
>
>

-- 
Ryan Blue
Tabular

Reply via email to