Hey all, For some time I've been thinking that having a common serialized representation of query plans would be helpful across multiple related projects. I started working on something independently in this space several months ago. Since then, Arrow started exploring "Arrow IR" and Iceberg was proposing something similar to support a cross-engine structured view. Given the different veins of interest, I think we should combine forces on a consolidated consensus-driven solution.
As I've had more conversations with different people, I've come to the conclusion that given the complexity of the task and people's competing priorities, a separate "Switzerland project" is the best way to find common ground. As such, I've started to sketch out a specification [1] called Substrait. One of my key goals with this effort is to expose Calcite functionality to more users and expose alternative ways to encapsulate Calcite functionality as a microservice or series of microservices. For those that are interested, please join the Substrait Slack. My first goal is to come to a consensus on the type system of simple [2], compound [3] and physical [4] types. The general approach I'm proposing: - Use Spark, Trino, Arrow and Iceberg as the four indicators of whether something should be part of the spec. It must exist in at least two systems to be formalized. - Avoid a formal distinction between logical and physical (types, operators, etc) - Lean more towards simple types than compound types when systems generally use only a constrained set of parameters (e.g. timestamp(3) and timestamp(6) as opposed to timestamp(x)). - Provide substantial structured extensibility (avoid black boxes as much as possible) Links for Substrait: Site: https://substrait.io Spec source: https://github.com/substrait-io/substrait/tree/main/site/docs Binary format: https://github.com/substrait-io/substrait/tree/main/binary Would love to hear your thoughts! Jacques [1] https://substrait.io/spec/specification/#components [2] https://substrait.io/types/simple_logical_types/ [3] https://substrait.io/types/compound_logical_types/ [4] https://substrait.io/types/physical_types/
