Hi Wes,
We have a positive experience using Substrait to transfer IRs between
engines and express custom behavior via extensions, but in a tightly
integrated controlled environment. I am not sure that extensions will work
well at a large scale, as it may lead to the same problem we are trying to
s
Disclaimer: Also on Substrait SMC here and also have made some considerable
investment into Substrait in my professional work (Lance) so this is an not
unbiased opinion. I just want to give a few words on why I think an IR
(and specifically Substrait) is not just another dialect (I'll try and be
m
Hi Walaa,
Many of these questions do not have universally correct answers. SQL
dialects and IRs are incompatible between engines. And even if we find a
compatible IR, many of the simplest queries with the same IR will behave
differently between engines, largely defeating the whole idea of view
int
Hey Ajantha, thanks for looping me in. This is a great conversation.
FYI, I'm a co-creator of Substrait so read this all with that in mind.
Substrait has a couple of key underpinnings that are worth noting:
1. It's a specification first and foremost (with tools to help work with
the specification
Hi Walaa, thanks for summarizing the questions.
** If there are interesting applications of introducing an IR in addition
> to dialects, should Iceberg adopt only one IR as the canonical "Iceberg
> IR", or should it be able to "represent IRs" in the same way it is able to
> "represent dialects"?*
Hi Ajantha,
I do not clearly see a consensus in this thread. If anything, I see this
thread posing more questions than answers. Here is the collection of
questions I could distill from the thread:
** What is the unique problem that is solved if Iceberg represents an IR as
opposed to representing
For reference, there are two reasons why I chose to add that substrait.go:
1) The Golang Arrow implementation has a compute package which is able to
evaluate substrait expressions as long as the kernels exist in the package.
2) Along the lines of this conversation, I wanted to be able to generica
Matt also just added `substrait.go` to the Iceberg-Go implementation that I
was reviewing today:
https://github.com/apache/iceberg-go/pull/185/files#diff-81cfac9f2e1dcf6265c569d0a3397964f0b78e07f45bb9dcdd3effe0623aaf73
That converts an Iceberg expression into a substrate one, pretty exciting
stuff
Hi Ajantha,
During CommunityOverCode, I chatted with Matt Topol about Substrait and ADBC.
I checked the Substrait support in DataFusion and it's interesting.
I was thinking about where to actually store the Substrait plan (I was
thinking about an intermediate SQL representation that we could stor
Thanks everyone for the detailed discussions.
Looks like we have consensus towards Substrait.
Last time I checked it was not adopted by all the engines. But we can work
towards the adoption as well.
I will explore further on Substrait and come up with the design doc on the
same.
Thanks,
Ajantha
Hey all,
I'm +1 in efforts to make views more interoperable across engines as I
believe such efforts would be beneficial for the wider ecosystem. I think
the way to do that is through higher fidelity IRs such as Substrait.
I agree with Walaa that there's not really a valid distinction between IR
Hi,
I have no experience with Substrait, but i agree it looks like the tool for
the job.
Or, as I proposed earlier, we define our own Iceberg IR for the views.
We can experiment with serialized IR being stored as a String with new
dialect name, and this is how we should get this started.
It's pro
Hi Fokko,
We can implement Python/Rust/Go clients to interop with the serialized
Coral IR. Not sure if it makes sense to have all front-end and back-end
implementations (e.g., Spark to Coral IR or Coral IR to Trino, etc) be
reimplemented in those languages. Such implementations actually depend on
Hey everyone,
Views in PyIceberg are not yet as mature as in Java, mostly because tooling
in Python tends to work with data frames, rather than SQL. I do think it
would be valuable to extend support there.
I have a bit of experience in turning SQL into ASTs and extending grammar,
and I'm confiden
I think this may need some more discussion.
To me, a "serialized IR" is another form of a "dialect". In this case, this
dialect will be mostly specific to Iceberg, and compute engines will still
support reading views in their native SQL. There are some data points on
this from the Trino community
t;
>>>
>>> [1]: https://github.com/substrait-io/substrait-java/pull/271
>>>
>>> [2]:
>>> https://github.com/substrait-io/substrait-java/blob/main/isthmus/README.md
>>>
>>> [3]:
>>> https://svn.apache.org/repos/asf/calcite/site/apidocs/
m/substrait-io/substrait-java/blob/main/isthmus/README.md
>>
>> [3]:
>> https://svn.apache.org/repos/asf/calcite/site/apidocs/org/apache/calcite/sql/dialect/package-summary.html
>>
>>
>>
>> *From: *Ajantha Bhat
>> *Date: *Tuesday, 22 October 2024 at 08
t
> *Date: *Tuesday, 22 October 2024 at 08:22
> *To: *dev@iceberg.apache.org
> *Subject: *Re: [Discuss] Iceberg View Interoperability
>
> *CAUTION:* This email originates from an external party (outside of
> Palantir). If you believe this message is suspicious in nature, please use
> t
://github.com/substrait-io/substrait-java/blob/main/isthmus/README.md
[3]:
https://svn.apache.org/repos/asf/calcite/site/apidocs/org/apache/calcite/sql/dialect/package-summary.html
From: Ajantha Bhat
Date: Tuesday, 22 October 2024 at 08:22
To: dev@iceberg.apache.org
Subject: Re: [Discuss] Iceberg View
Thanks Dan for the reply.
but I'm not convinced using a procedure is a good idea or really moves
> things forward in that direction.
> I just feel like the procedure approach has a number of drawbacks
> including, relying on the user to do the translation, being dependent on
> Spark for defining t
Hey Ajantha,
I think it's good to figure out a path forward for extending view support,
but I'm not convinced using a procedure is a good idea or really moves
things forward in that direction.
As you already indicated, there are a number of different libraries to
translate views, but of the vario
21 matches
Mail list logo