Hey Anjali, I am definitely happy to help with implementing 1-3 in your first list once the spec has been approved by the community. My hope is that the final version of the view spec will make it easy to re-use existing rollback/time travel/metadata etc functionalities.
Regarding SQL dialects.My personal opinion is: Enforcing ANSI-compliant SQL across all engines is hard and probably not desirable while storing Calcite makes it hard for eg python to use views. A project to make a cross language and cross engine IR for sql views and the relevant transpilers is imho outside the scope of this spec and probably deserving an apache project of its own. A smaller IR like Piotr suggested is possible but I feel it will likely quickly snowball into a larger project and slow down adoption of the view spec in iceberg. So I think the most reasonable way forward is to add a dialect field and a warning to engines that views are not (yet) cross compatible. This is at odds with the original spirit of iceberg tables and I wonder how the border community feels about it? I would hope that we can make the view spec engine-free over time and eventually deprecate the dialect field. Best, Ryan PS if anyone is interested in collaborating on engine agnostic views please reach out. I am keen on exploring this topic. On Tue, Jul 20, 2021 at 10:51 PM Anjali Norwood <anorw...@netflix.com.invalid> wrote: > Thank you Ryan (M), Piotr and Vivekanand for the comments. I have and will > continue to address them in the doc. > Great to know about Trino views, Piotr! > > Thanks to everybody who has offered help with implementation. The spec as > it is proposed in the doc has been implemented and is in use at Netflix > (currently on Iceberg 0.9). Once we close the spec, we will rebase our code > to Iceberg-0.12 and incorporate changes to format and other feedback from > the community and should be able to make this MVP implementation available > quickly as a PR. > > A few areas that we have not yet worked on and would love for the > community to help are: > 1. Time travel on views: Be able to access the view as of a version or time > 2. History table: A system table implementation for $versions similar to > the $snapshots table in order to display the history of a view > 3. Rollback to a version: A way to rollback a view to a previous version > 4. Engine agnostic SQL: more below. > > One comment that is worth a broader discussion is the dialect of the SQL > stored in the view metadata. The purpose of the spec is to provide a > storage format for view metadata and APIs to access that metadata. The > dialect of the SQL stored is an orthogonal question and is outside the > scope of this spec. > > Nonetheless, it is an important concern, so compiling a few suggestions > that came up in the comments to continue the discussion: > 1. Allow only ANSI-compliant SQL and anything that is truly common across > engines in the view definition (this is how currently Netflix uses these > 'common' views across Spark and Trino) > 2. Add a field to the view metadata to identify the dialect of the SQL. > This allows for any desired dialect, but no improved cross-engine > operability > 3. Store AST produced by Calcite in the view metadata and translate back > and forth between engine-supported SQL and AST > 4. Intermediate structured language of our own. (What additional > functionality does it provide over Calcite?) > > Given that the view metadata is json, it is easily extendable to > incorporate any new fields needed to make the SQL truly compatible across > engines. > > What do you think? > > regards, > Anjali > > > > On Tue, Jul 20, 2021 at 3:09 AM Piotr Findeisen <pi...@starburstdata.com> > wrote: > >> Hi, >> >> FWIW, in Trino we just added Trino views support. >> https://github.com/trinodb/trino/pull/8540 >> Of course, this is by no means usable by other query engines. >> >> Anjali, your document does not talk much about compatibility between >> query engines. >> How do you plan to address that? >> >> For example, I am familiar with Coral, and I appreciate its powers for >> dealing with legacy stuff like views defined by Hive. >> I treat it as a great technology supporting transitioning from a query >> engine to a better one. >> However, I would not base a design of some new system for storing >> cross-engine compatible views on that. >> >> Is there something else we can use? >> Maybe the view definition should use some intermediate structured >> language that's not SQL? >> For example, it could represent logical structure of operations in >> semantics manner. >> This would eliminate need for cross-engine compatible parsing and >> analysis. >> >> Best >> PF >> >> >> >> On Tue, Jul 20, 2021 at 11:04 AM Ryan Murray <rym...@gmail.com> wrote: >> >>> Thanks Anjali! >>> >>> I have left some comments on the document. I unfortunately have to miss >>> the community meetup tomorrow but would love to chat more/help w/ >>> implementation. >>> >>> Best, >>> Ryan >>> >>> On Tue, Jul 20, 2021 at 7:42 AM Anjali Norwood >>> <anorw...@netflix.com.invalid> wrote: >>> >>>> Hello, >>>> >>>> John Zhuge and I would like to propose the following spec for storing >>>> view metadata in Iceberg. The proposal has been implemented [1] and is in >>>> production at Netflix for over 15 months. >>>> >>>> >>>> https://docs.google.com/document/d/1wQt57EWylluNFdnxVxaCkCSvnWlI8vVtwfnPjQ6Y7aw/edit?usp=sharing >>>> >>>> [1] >>>> https://github.com/Netflix/iceberg/tree/netflix-spark-2.4/view/src/main/java/com/netflix/bdp/view >>>> >>>> Please let us know your thoughts by adding comments to the doc. >>>> >>>> Thanks, >>>> Anjali. >>>> >>>