Re: [Discuss] Iceberg View Interoperability

2024-12-05 Thread Vladimir Ozerov
Hi Wes, We have a positive experience using Substrait to transfer IRs between engines and express custom behavior via extensions, but in a tightly integrated controlled environment. I am not sure that extensions will work well at a large scale, as it may lead to the same problem we are trying to s

Re: [Discuss] Iceberg View Interoperability

2024-12-05 Thread Weston Pace
Disclaimer: Also on Substrait SMC here and also have made some considerable investment into Substrait in my professional work (Lance) so this is an not unbiased opinion. I just want to give a few words on why I think an IR (and specifically Substrait) is not just another dialect (I'll try and be m

Re: [Discuss] Iceberg View Interoperability

2024-12-05 Thread Vladimir Ozerov
Hi Walaa, Many of these questions do not have universally correct answers. SQL dialects and IRs are incompatible between engines. And even if we find a compatible IR, many of the simplest queries with the same IR will behave differently between engines, largely defeating the whole idea of view int

Re: [Discuss] Iceberg View Interoperability

2024-11-29 Thread Jacques Nadeau
Hey Ajantha, thanks for looping me in. This is a great conversation. FYI, I'm a co-creator of Substrait so read this all with that in mind. Substrait has a couple of key underpinnings that are worth noting: 1. It's a specification first and foremost (with tools to help work with the specification

Re: [Discuss] Iceberg View Interoperability

2024-11-29 Thread Ajantha Bhat
Hi Walaa, thanks for summarizing the questions. ** If there are interesting applications of introducing an IR in addition > to dialects, should Iceberg adopt only one IR as the canonical "Iceberg > IR", or should it be able to "represent IRs" in the same way it is able to > "represent dialects"?*

Re: [Discuss] Iceberg View Interoperability

2024-11-28 Thread Walaa Eldin Moustafa
Hi Ajantha, I do not clearly see a consensus in this thread. If anything, I see this thread posing more questions than answers. Here is the collection of questions I could distill from the thread: ** What is the unique problem that is solved if Iceberg represents an IR as opposed to representing

Re: [Discuss] Iceberg View Interoperability

2024-11-04 Thread Matt Topol
For reference, there are two reasons why I chose to add that substrait.go: 1) The Golang Arrow implementation has a compute package which is able to evaluate substrait expressions as long as the kernels exist in the package. 2) Along the lines of this conversation, I wanted to be able to generica

Re: [Discuss] Iceberg View Interoperability

2024-11-04 Thread Fokko Driesprong
Matt also just added `substrait.go` to the Iceberg-Go implementation that I was reviewing today: https://github.com/apache/iceberg-go/pull/185/files#diff-81cfac9f2e1dcf6265c569d0a3397964f0b78e07f45bb9dcdd3effe0623aaf73 That converts an Iceberg expression into a substrate one, pretty exciting stuff

Re: [Discuss] Iceberg View Interoperability

2024-11-04 Thread Jean-Baptiste Onofré
Hi Ajantha, During CommunityOverCode, I chatted with Matt Topol about Substrait and ADBC. I checked the Substrait support in DataFusion and it's interesting. I was thinking about where to actually store the Substrait plan (I was thinking about an intermediate SQL representation that we could stor

Re: [Discuss] Iceberg View Interoperability

2024-11-04 Thread Ajantha Bhat
Thanks everyone for the detailed discussions. Looks like we have consensus towards Substrait. Last time I checked it was not adopted by all the engines. But we can work towards the adoption as well. I will explore further on Substrait and come up with the design doc on the same. Thanks, Ajantha

Re: [Discuss] Iceberg View Interoperability

2024-10-28 Thread Amogh Jahagirdar
Hey all, I'm +1 in efforts to make views more interoperable across engines as I believe such efforts would be beneficial for the wider ecosystem. I think the way to do that is through higher fidelity IRs such as Substrait. I agree with Walaa that there's not really a valid distinction between IR

Re: [Discuss] Iceberg View Interoperability

2024-10-28 Thread Piotr Findeisen
Hi, I have no experience with Substrait, but i agree it looks like the tool for the job. Or, as I proposed earlier, we define our own Iceberg IR for the views. We can experiment with serialized IR being stored as a String with new dialect name, and this is how we should get this started. It's pro

Re: [Discuss] Iceberg View Interoperability

2024-10-28 Thread Walaa Eldin Moustafa
Hi Fokko, We can implement Python/Rust/Go clients to interop with the serialized Coral IR. Not sure if it makes sense to have all front-end and back-end implementations (e.g., Spark to Coral IR or Coral IR to Trino, etc) be reimplemented in those languages. Such implementations actually depend on

Re: [Discuss] Iceberg View Interoperability

2024-10-28 Thread Fokko Driesprong
Hey everyone, Views in PyIceberg are not yet as mature as in Java, mostly because tooling in Python tends to work with data frames, rather than SQL. I do think it would be valuable to extend support there. I have a bit of experience in turning SQL into ASTs and extending grammar, and I'm confiden

Re: [Discuss] Iceberg View Interoperability

2024-10-25 Thread Walaa Eldin Moustafa
I think this may need some more discussion. To me, a "serialized IR" is another form of a "dialect". In this case, this dialect will be mostly specific to Iceberg, and compute engines will still support reading views in their native SQL. There are some data points on this from the Trino community

Re: [Discuss] Iceberg View Interoperability

2024-10-25 Thread Szehon Ho
t; >>> >>> [1]: https://github.com/substrait-io/substrait-java/pull/271 >>> >>> [2]: >>> https://github.com/substrait-io/substrait-java/blob/main/isthmus/README.md >>> >>> [3]: >>> https://svn.apache.org/repos/asf/calcite/site/apidocs/

Re: [Discuss] Iceberg View Interoperability

2024-10-25 Thread rdb...@gmail.com
m/substrait-io/substrait-java/blob/main/isthmus/README.md >> >> [3]: >> https://svn.apache.org/repos/asf/calcite/site/apidocs/org/apache/calcite/sql/dialect/package-summary.html >> >> >> >> *From: *Ajantha Bhat >> *Date: *Tuesday, 22 October 2024 at 08

Re: [Discuss] Iceberg View Interoperability

2024-10-25 Thread Szehon Ho
t > *Date: *Tuesday, 22 October 2024 at 08:22 > *To: *dev@iceberg.apache.org > *Subject: *Re: [Discuss] Iceberg View Interoperability > > *CAUTION:* This email originates from an external party (outside of > Palantir). If you believe this message is suspicious in nature, please use > t

Re: [Discuss] Iceberg View Interoperability

2024-10-22 Thread Will Raschkowski
://github.com/substrait-io/substrait-java/blob/main/isthmus/README.md [3]: https://svn.apache.org/repos/asf/calcite/site/apidocs/org/apache/calcite/sql/dialect/package-summary.html From: Ajantha Bhat Date: Tuesday, 22 October 2024 at 08:22 To: dev@iceberg.apache.org Subject: Re: [Discuss] Iceberg View

Re: [Discuss] Iceberg View Interoperability

2024-10-22 Thread Ajantha Bhat
Thanks Dan for the reply. but I'm not convinced using a procedure is a good idea or really moves > things forward in that direction. > I just feel like the procedure approach has a number of drawbacks > including, relying on the user to do the translation, being dependent on > Spark for defining t

Re: [Discuss] Iceberg View Interoperability

2024-10-17 Thread Daniel Weeks
Hey Ajantha, I think it's good to figure out a path forward for extending view support, but I'm not convinced using a procedure is a good idea or really moves things forward in that direction. As you already indicated, there are a number of different libraries to translate views, but of the vario