There are also some good conversations happening in on the Github
discussion forum [1].

We're trying to drive consensus around the type system to start [2]. Would
love the Iceberg community members to weigh in as the contributors are
fairly Arrow heavy atm.

Thanks,
Jacques


[1] https://github.com/substrait-io/substrait/discussions
[2] https://github.com/substrait-io/substrait/discussions/2


On Fri, Sep 10, 2021 at 9:19 AM Ryan Blue <b...@tabular.io> wrote:

> Nevermind, I see there's a Substrait Slack community. Here's the invite
> link for anyone else that's interested:
> https://join.slack.com/t/substrait/shared_invite/zt-vivbux2c-~B1jEWcR0wYhq5k4LHuoLQ
>
> On Fri, Sep 10, 2021 at 9:16 AM Ryan Blue <b...@tabular.io> wrote:
>
>> Thanks, Jacques! I think it's a great idea to have this as an external
>> project so that it doesn't get tied to a particular set of goals for an
>> existing project.
>>
>> Where is a good place to discuss this? Should we create a #substrait room
>> on Iceberg Slack? ASF Slack? On this thread?
>>
>> Ryan
>>
>> On Wed, Sep 8, 2021 at 8:21 AM Jacques Nadeau <jacquesnad...@gmail.com>
>> wrote:
>>
>>> Hey all,
>>>
>>> For some time I've been thinking that having a common serialized
>>> representation of query plans would be helpful across multiple related
>>> projects. I started working on something independently in this vein several
>>> months ago. Since then, Arrow has started exploring "Arrow IR" and in
>>> Iceberg, Piotr and others were proposing something similar to support a
>>> cross-engine structured view. Given the different veins of interest, I
>>> think we should combine forces on a consolidated consensus-driven solution.
>>>
>>> As I've had more conversations with different people, I've come to the
>>> conclusion that given the complexity of the task and people's
>>> competing priorities, a separate "Switzerland" project is the best way to
>>> find common ground. As such, I've started to sketch out a specification [1]
>>> called Substrait. I'd love to collaborate with the Iceberg community to
>>> ensure the specification does a good job of supporting the needs of this
>>> project.
>>>
>>> For those that are interested, please join Slack and/or start a
>>> discussion on GitHub. My first goal is to come to consensus on the type
>>> system of simple [2], compound [3] and physical [4] types. The general
>>> approach I'm trying to follow is:
>>>
>>>    - Use Spark, Trino, Arrow and Iceberg as the four indicators of
>>>    whether something should be first class. It must exist in at least two
>>>    systems to be formalized.
>>>    - Avoid a formal distinction between logical and physical (types,
>>>    operators, etc)
>>>    - Lean more towards simple types than compound types when systems
>>>    generally use only a constrained set of parameters (e.g. timestamp(3) and
>>>    timestamp(6) as opposed to timestamp(x)).
>>>
>>>
>>> Links for Substrait:
>>> Site: https://substrait.io
>>> Spec source:
>>> https://github.com/substrait-io/substrait/tree/main/site/docs
>>> Binary format:
>>> https://github.com/substrait-io/substrait/tree/main/binary
>>>
>>> Please let me know your thoughts,
>>> Jacques
>>>
>>> [1] https://substrait.io/spec/specification/#components
>>> [2] https://substrait.io/types/simple_logical_types/
>>> [3] https://substrait.io/types/compound_logical_types/
>>> [4] https://substrait.io/types/physical_types/
>>>
>>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>
>
> --
> Ryan Blue
> Tabular
>

Reply via email to