Hey all,

For some time I've been thinking that having a common serialized
representation of query plans would be helpful across multiple related
projects. I started working on something independently in this vein several
months ago. Since then, Arrow has started exploring "Arrow IR" and in
Iceberg, Piotr and others were proposing something similar to support a
cross-engine structured view. Given the different veins of interest, I
think we should combine forces on a consolidated consensus-driven solution.

As I've had more conversations with different people, I've come to the
conclusion that given the complexity of the task and people's
competing priorities, a separate "Switzerland" project is the best way to
find common ground. As such, I've started to sketch out a specification [1]
called Substrait. I'd love to collaborate with the Iceberg community to
ensure the specification does a good job of supporting the needs of this
project.

For those that are interested, please join Slack and/or start a discussion
on GitHub. My first goal is to come to consensus on the type system of
simple [2], compound [3] and physical [4] types. The general approach I'm
trying to follow is:

   - Use Spark, Trino, Arrow and Iceberg as the four indicators of whether
   something should be first class. It must exist in at least two systems to
   be formalized.
   - Avoid a formal distinction between logical and physical (types,
   operators, etc)
   - Lean more towards simple types than compound types when systems
   generally use only a constrained set of parameters (e.g. timestamp(3) and
   timestamp(6) as opposed to timestamp(x)).


Links for Substrait:
Site: https://substrait.io
Spec source: https://github.com/substrait-io/substrait/tree/main/site/docs
Binary format: https://github.com/substrait-io/substrait/tree/main/binary

Please let me know your thoughts,
Jacques

[1] https://substrait.io/spec/specification/#components
[2] https://substrait.io/types/simple_logical_types/
[3] https://substrait.io/types/compound_logical_types/
[4] https://substrait.io/types/physical_types/

Reply via email to