Thank you Ryan (M), Piotr and Vivekanand for the comments. I have and will continue to address them in the doc. Great to know about Trino views, Piotr!
Thanks to everybody who has offered help with implementation. The spec as it is proposed in the doc has been implemented and is in use at Netflix (currently on Iceberg 0.9). Once we close the spec, we will rebase our code to Iceberg-0.12 and incorporate changes to format and other feedback from the community and should be able to make this MVP implementation available quickly as a PR. A few areas that we have not yet worked on and would love for the community to help are: 1. Time travel on views: Be able to access the view as of a version or time 2. History table: A system table implementation for $versions similar to the $snapshots table in order to display the history of a view 3. Rollback to a version: A way to rollback a view to a previous version 4. Engine agnostic SQL: more below. One comment that is worth a broader discussion is the dialect of the SQL stored in the view metadata. The purpose of the spec is to provide a storage format for view metadata and APIs to access that metadata. The dialect of the SQL stored is an orthogonal question and is outside the scope of this spec. Nonetheless, it is an important concern, so compiling a few suggestions that came up in the comments to continue the discussion: 1. Allow only ANSI-compliant SQL and anything that is truly common across engines in the view definition (this is how currently Netflix uses these 'common' views across Spark and Trino) 2. Add a field to the view metadata to identify the dialect of the SQL. This allows for any desired dialect, but no improved cross-engine operability 3. Store AST produced by Calcite in the view metadata and translate back and forth between engine-supported SQL and AST 4. Intermediate structured language of our own. (What additional functionality does it provide over Calcite?) Given that the view metadata is json, it is easily extendable to incorporate any new fields needed to make the SQL truly compatible across engines. What do you think? regards, Anjali On Tue, Jul 20, 2021 at 3:09 AM Piotr Findeisen <pi...@starburstdata.com> wrote: > Hi, > > FWIW, in Trino we just added Trino views support. > https://github.com/trinodb/trino/pull/8540 > Of course, this is by no means usable by other query engines. > > Anjali, your document does not talk much about compatibility between query > engines. > How do you plan to address that? > > For example, I am familiar with Coral, and I appreciate its powers for > dealing with legacy stuff like views defined by Hive. > I treat it as a great technology supporting transitioning from a query > engine to a better one. > However, I would not base a design of some new system for storing > cross-engine compatible views on that. > > Is there something else we can use? > Maybe the view definition should use some intermediate structured language > that's not SQL? > For example, it could represent logical structure of operations in > semantics manner. > This would eliminate need for cross-engine compatible parsing and analysis. > > Best > PF > > > > On Tue, Jul 20, 2021 at 11:04 AM Ryan Murray <rym...@gmail.com> wrote: > >> Thanks Anjali! >> >> I have left some comments on the document. I unfortunately have to miss >> the community meetup tomorrow but would love to chat more/help w/ >> implementation. >> >> Best, >> Ryan >> >> On Tue, Jul 20, 2021 at 7:42 AM Anjali Norwood >> <anorw...@netflix.com.invalid> wrote: >> >>> Hello, >>> >>> John Zhuge and I would like to propose the following spec for storing >>> view metadata in Iceberg. The proposal has been implemented [1] and is in >>> production at Netflix for over 15 months. >>> >>> >>> https://docs.google.com/document/d/1wQt57EWylluNFdnxVxaCkCSvnWlI8vVtwfnPjQ6Y7aw/edit?usp=sharing >>> >>> [1] >>> https://github.com/Netflix/iceberg/tree/netflix-spark-2.4/view/src/main/java/com/netflix/bdp/view >>> >>> Please let us know your thoughts by adding comments to the doc. >>> >>> Thanks, >>> Anjali. >>> >>