+1 On Sun, Aug 22, 2021 at 10:02 AM Ryan Blue <b...@tabular.io> wrote:
> Thanks for working on this, Anjali! It’s great to see the thorough > discussion here with everyone. > > For the discussion about SQL dialect, I think that the right first step is > to capture the SQL or query dialect. That will give us the most > flexibility. Engines that can use Coral for translation can attempt to > convert and engines that don’t can see if the SQL is valid and can be used. > > I think that the idea to create a minimal IR is an interesting one, but > can be added later. We will always need to record the SQL and dialect, even > if we translate to IR because users are going to configure views using SQL. > Uses like showing view history or debugging need to show the original SQL, > plus relevant information like where it was created and the SQL dialect. We > should be able to add this later by adding additional metadata to the view > definition. I don’t think that it would introduce breaking changes to add a > common representation that can be optionally consumed. > > Let’s continue talking about a minimal IR, separately. View translation is > a hard problem. Right now, to get views across engines we have to > compromise confidence. IR is a way to have strong confidence, but with > limited expressibility. I think that’s a good trade in a lot of cases and > is worth pursuing, even if it will take a long time. > > Jacques makes a great point about types, but I think that the right option > here is to continue using Iceberg types. We’ve already had discussions > about whether Iceberg should support annotating types with engine-specific > ones, so we have a reasonable way to improve this while also providing > compatibility across engines: char(n) is not necessarily supported > everywhere and mapping it to string will make sense in most places. The > schema is primarily used to validate that the data produced by the query > hasn’t changed and that is more about the number of columns in structs and > the names of fields rather than exact types. We can fix up types when > substituting without losing too much: if the SQL produces a varchar(10) > field that the view metadata says is a string, then it’s okay that it is > varchar(10). There is some loss in that we don’t know if it was > originally varchar(5), but I think that this is not going to cause too > many issues. Not all engines will even validate that the schema has not > changed, since it could be valid to use select * from x where ... and > allow new fields to appear. > > Right now, I think we should move forward with the proposal as it is, and > pursue type annotations and possibly IR in parallel. Does that sound > reasonable to everyone? > > Ryan > > On Thu, Jul 29, 2021 at 7:50 AM Piotr Findeisen <pi...@starburstdata.com> > wrote: > >> Hi Anjali, >> >> That's a nice summary. >> >> re dialect field. It shouldn't be a bit trouble to have it (or any other >> way to identify application that created the view), and it might be useful. >> Why not make it required from the start? >> >> re "expanded/resolved SQL" -- i don't understand yet what we would put >> there, so cannot comment. >> >> I agree there it's nice to get something out of the door, and I see how >> the current proposal fits some needs already. >> However, i am concerned about the proliferation of non-cross-engine >> compatible views, if we do that. >> >> Also, if we later agree on any compatible approach (portable subset of >> SQL, engine-agnostic IR, etc.), then from the perspective of each engine, >> it would be a breaking change. >> Unless we make the compatible approach as expressive as full power of >> SQL, some views that are possible to create in v1 will not be possible to >> create in v2. >> Thus, if v1 is "some SQL" and v2 is "something awesomely compatible", we >> may not be able to roll it out. >> >> > the convention of common SQL has been working for a majority of users. >> SQL features commonly used are column projections, simple filter >> application, joins, grouping and common aggregate and scalar function. A >> few users occasionally would like to use Trino or Spark specific functions >> but are sometimes able to find a way to use a function that is common to >> both the engines. >> >> >> it's an awesome summary of what constructs are necessary to be able to >> define useful views, while also keep them portable. >> >> To be able to express column projections, simple filter application, >> joins, grouping and common aggregate and scalar function in a structured >> IR, how much effort do you think would be required? >> We didn't really talk about downsides of a structured approach, other >> than it looks complex. >> if we indeed estimate it as a multi-year effort, i wouldn't argue for >> that. Maybe i were overly optimistic though. >> >> >> As Jack mentioned, for engine-specific approach that's not supposed to be >> consumed by multiple engines, we may be better served with approach that's >> outside of Iceberg spec, like https://github.com/trinodb/trino/pull/8540. >> >> >> Best, >> PF >> >> >> >> >> >> On Thu, Jul 29, 2021 at 12:33 PM Anjali Norwood >> <anorw...@netflix.com.invalid> wrote: >> >>> Hi, >>> >>> Thank you for all the comments. I will try to address them all here >>> together. >>> >>> >>> - @all Cross engine compatibility of view definition: Multiple >>> options such as engine agnostic SQL or IR of some form have been >>> mentioned. >>> We can all agree that all of these options are non-trivial to >>> design/implement (perhaps a multi-year effort based on the option chosen) >>> and merit further discussion. I would like to suggest that we continue >>> this >>> discussion but target this work for the future (v2?). In v1, we can add >>> an >>> optional dialect field and an optional expanded/resolved SQL field that >>> can >>> be interpreted by engines as they see fit. V1 can unlock many use cases >>> where the views are either accessed by a single engine or multi-engine >>> use >>> cases where a (common) subset of SQL is supported. This proposal allows >>> for >>> desirable features such as versioning of views and a common format of >>> storing view metadata while allowing extensibility in the future. *Does >>> anyone feel strongly otherwise?* >>> - @Piotr As for common views at Netflix, the restrictions on SQL >>> are not enforced, but are advised as best practices. The convention of >>> common SQL has been working for a majority of users. SQL features >>> commonly >>> used are column projections, simple filter application, joins, grouping >>> and >>> common aggregate and scalar function. A few users occasionally would like >>> to use Trino or Spark specific functions but are sometimes able to find a >>> way to use a function that is common to both the engines. >>> - @Jacques and @Jack Iceberg data types are engine agnostic and >>> hence were picked for storing view schema. Thinking further, the schema >>> field should be made 'optional', since not all engines require it. (e.g. >>> Spark does not need it and Trino uses it only for validation). >>> - @Jacques Table references in the views can be arbitrary objects >>> such as tables from other catalogs or elasticsearch tables etc. I will >>> clarify it in the spec. >>> >>> I will work on incorporating all the comments in the spec and make the >>> next revision available for review soon. >>> >>> Regards, >>> Anjali. >>> >>> >>> >>> >>> On Tue, Jul 27, 2021 at 2:51 AM Piotr Findeisen <pi...@starburstdata.com> >>> wrote: >>> >>>> Hi, >>>> >>>> Thanks Jack and Jacques for sharing your thoughts. >>>> >>>> I agree that tracking dialect/origin is better than nothing. >>>> I think having a Map {dialect: sql} is not going to buy us much. >>>> I.e. it would be useful if there was some external app (or a human >>>> being) that would write those alternative SQLs for each dialect. >>>> Otherwise I am not imagining Spark writing SQL for Spark and Trino, or >>>> Trino writing SQL for Trino and Spark. >>>> >>>> Thanks Jacques for a good summary of SQL supporting options. >>>> While i like the idea of starting with Trino SQL ANTLR grammar file >>>> (it's really well written and resembles spec quite well), you made a good >>>> point that grammar is only part of the problem. Coercions, function >>>> resolution, dereference resolution, table resolution are part of query >>>> analysis that goes beyond just grammar. >>>> In fact, column scoping rules -- while clearly defined by the spec -- >>>> may easily differ between engines (pretty usual). >>>> That's why i would rather lean towards some intermediate representation >>>> that is *not *SQL, doesn't require parsing (is already structural), >>>> nor analysis (no scopes! no implicit coercions!). >>>> Before we embark on such a journey, it would be interesting to hear @Martin >>>> Traverso <mar...@starburstdata.com> 's thoughts on feasibility though. >>>> >>>> >>>> Best, >>>> PF >>>> >>>> >>>> >>>> >>>> On Fri, Jul 23, 2021 at 3:27 AM Jacques Nadeau <jacquesnad...@gmail.com> >>>> wrote: >>>> >>>>> Some thoughts... >>>>> >>>>> - In general, many engines want (or may require) a resolved sql >>>>> field. This--at minimum--typically includes star expansion since >>>>> traditional view behavior is stars are expanded at view creation time >>>>> (since this is the only way to guarantee that the view returns the same >>>>> logical definition even if the underlying table changes). This may also >>>>> include a replacement of relative object names to absolute object names >>>>> based on the session catalog & namespace. If I recall correctly, Hive >>>>> does >>>>> both of these things. >>>>> - It isn't clear in the spec whether the table references used in >>>>> views are restricted to other Iceberg objects or can be arbitrary >>>>> objects >>>>> in the context of a particular engine. Maybe I missed this? For >>>>> example, >>>>> can I have a Trino engine view that references an Elasticsearch table >>>>> stored in an Iceberg view? >>>>> - Restricting schemas to the Iceberg types will likely lead to >>>>> unintended consequences. I appreciate the attraction to it but I think >>>>> it >>>>> would either create artificial barriers around the types of SQL that >>>>> are >>>>> allowed and/or mean that replacing a CTE with a view could potentially >>>>> change the behavior of the query which I believe violates most typical >>>>> engine behaviors. A good example of this is the simple sql statement of >>>>> "SELECT c1, 'foo' as c2 from table1". In many engines (and Calcite by >>>>> default I believe), c2 will be specified as a CHAR(3). In this Iceberg >>>>> context, is this view disallowed? If it isn't disallowed then you have >>>>> an >>>>> issue where the view schema will be required to be different from a CTE >>>>> since the engine will resolve it differently than Iceberg. Even if you >>>>> ignore CHAR(X), you've still got VARCHAR(X) to contend with... >>>>> - It is important to remember that Calcite is a set of libraries >>>>> and not a specification. There are things that can be specified in >>>>> Calcite >>>>> but in general it doesn't have formal specification as a first >>>>> principle. >>>>> It is more implementation as a first principle. This is in contrast to >>>>> projects like Arrow and Iceberg, which start with well-formed >>>>> specifications. I've been working with Calcite since before it was an >>>>> Apache project and I wouldn't recommend adopting it as any form of a >>>>> specification. On the flipside, I am very supportive of using it for a >>>>> reference implementation standard for Iceberg view consumption, >>>>> manipulation, etc. If anything, I'd suggest we start with the >>>>> adoption of >>>>> a relatively clear grammar, e.g. the Antlr grammar file that Spark [1] >>>>> and/or Trino [2] use. Even that is not a complete specification as >>>>> grammar >>>>> must still be interpreted with regards to type promotion, function >>>>> resolution, consistent unnamed expression naming, etc that aren't >>>>> defined >>>>> at the grammar level. I'd definitely avoid using Calcite's JavaCC >>>>> grammar >>>>> as it heavily embeds implementation details (in a good way) and relies >>>>> on >>>>> some fairly complex logic in the validator and sql2rel components to be >>>>> fully resolved/comprehended. >>>>> >>>>> Given the above, I suggest having a field which describes the >>>>> dialect(origin?) of the view and then each engine can decide how they want >>>>> to consume/mutate that view (and whether they want to or not). It does >>>>> risk >>>>> being a dumping ground. Nonetheless, I'd expect the alternative of >>>>> establishing a formal SQL specification to be a similarly long process to >>>>> the couple of years it took to build the Arrow and Iceberg specifications. >>>>> (Realistically, there is far more to specify here than there is in either >>>>> of those two domains.) >>>>> >>>>> Some other notes: >>>>> >>>>> - Calcite does provide a nice reference document [3] but it is not >>>>> sufficient to implement what is necessary for >>>>> parsing/validating/resolving >>>>> a SQL string correctly/consistently. >>>>> - Projects like Coral [4] are interesting here but even Coral is >>>>> based roughly on "HiveQL" which also doesn't have a formal >>>>> specification >>>>> process outside of the Hive version you're running. See this thread in >>>>> Coral slack [5] >>>>> - ZetaSQL [6] also seems interesting in this space. It feels >>>>> closer to specification based [7] than Calcite but is much less >>>>> popular in >>>>> the big data domain. I also haven't reviewed it's SQL completeness >>>>> closely, >>>>> a strength of Calcite. >>>>> - One of the other problems with building against an >>>>> implementation as opposed to a specification (e.g. Calcite) is it can >>>>> make >>>>> it difficult or near impossible to implement the same algorithms again >>>>> without a bunch of reverse engineering. If interested in an example of >>>>> this, see the discussion behind LZ4 deprecation on the Parquet spec >>>>> [8] for >>>>> how painful this kind of mistake can become. >>>>> - I'd love to use the SQL specification itself but nobody actually >>>>> implements that in its entirety and it has far too many places where >>>>> things >>>>> are "implementation-defined" [9]. >>>>> >>>>> [1] >>>>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 >>>>> [2] >>>>> https://github.com/trinodb/trino/blob/master/core/trino-parser/src/main/antlr4/io/trino/sql/parser/SqlBase.g4 >>>>> [3] https://calcite.apache.org/docs/reference.html >>>>> [4] https://github.com/linkedin/coral >>>>> [5] https://coral-sql.slack.com/archives/C01FHBJR20Y/p1608064004073000 >>>>> [6] https://github.com/google/zetasql >>>>> [7] https://github.com/google/zetasql/blob/master/docs/one-pager.md >>>>> [8] >>>>> https://github.com/apache/parquet-format/blob/master/Compression.md#lz4 >>>>> [9] https://twitter.com/sc13ts/status/1413728808830525440 >>>>> >>>>> On Thu, Jul 22, 2021 at 12:01 PM Jack Ye <yezhao...@gmail.com> wrote: >>>>> >>>>>> Did not notice that we are also discussing cross-engine >>>>>> interoperability here, I will add my response in the design doc here. >>>>>> >>>>>> I would personally prefer cross-engine interoperability as a goal and >>>>>> get the spec in the right structure in the initial release, because: >>>>>> >>>>>> 1. I believe that cross-engine compatibility is a critical feature of >>>>>> Iceberg. If I am a user of an existing data lake that already supports >>>>>> views (e.g. Hive), I don't even need Iceberg to have this view feature. I >>>>>> can do what is now done for Trino to use views with Iceberg. I can also >>>>>> just use a table property to indicate the table is a view and store the >>>>>> view SQL as a table property and do my own thing in any query engine to >>>>>> support all the view features. One of the most valuable and unique >>>>>> features >>>>>> that Iceberg view can unlock is to allow a view to be created in one >>>>>> engine >>>>>> and read by another. Not supporting cross-engine compatibility feels like >>>>>> losing a lot of value to me. >>>>>> >>>>>> 2. In the view definition, it feels inconsistent to me that we have >>>>>> "schema" as an Iceberg native schema, but "sql" field as the view SQL >>>>>> that >>>>>> can come from any query engine. If the engine already needs to convert >>>>>> the >>>>>> view schema to iceberg shema, it should just do the same for the view >>>>>> SQL. >>>>>> >>>>>> Regarding the way to achieve it, I think it comes to either Apache >>>>>> Calcite (or some other third party alternative I don't know) or our own >>>>>> implementation of some intermediate representation. I don't have a very >>>>>> strong opinion, but my thoughts are the following: >>>>>> >>>>>> 1. Calcite is supposed to be the go-to software to deal with this >>>>>> kind of issue, but my personal concern is that the integration is >>>>>> definitely going to be much more involved, and it will become another >>>>>> barrier for newer engines to onboard because it not only needs to >>>>>> implement >>>>>> Iceberg APIs but also needs Calcite support. It will also start to >>>>>> become a >>>>>> constant discussion around what we maintain and what we should push to >>>>>> Calcite, similar to our situation today with Spark. >>>>>> >>>>>> 2. Another way I am leaning towards, as Piotr also suggested, is to >>>>>> have a native lightweight logical query structure representation of the >>>>>> view SQL and store that instead of the SQL string. We already deal with >>>>>> Expressions in Iceberg, and engines have to convert predicates to Iceberg >>>>>> expressions for predicate pushdown. So I think it would not be hard to >>>>>> extend on that to support this use case. Different engines can build this >>>>>> logical structure when traversing their own AST during a create view >>>>>> query. >>>>>> >>>>>> 3. With these considerations, I think the "sql" field can potentially >>>>>> be a map (maybe called "engine-sqls"?), where key is the engine type and >>>>>> version like "Spark 3.1", and value is the view SQL string. In this way, >>>>>> the engine that creates the view can still read the SQL directly which >>>>>> might lead to better engine-native integration and avoid redundant >>>>>> parsing. >>>>>> But in this approach there is always a default intermediate >>>>>> representation >>>>>> it can fallback to when the engine's key is not found in the map. If we >>>>>> want to make incremental progress and delay the design for the >>>>>> intermediate >>>>>> representation, I think we should at least use this map instead of just a >>>>>> single string. >>>>>> >>>>>> Thanks, >>>>>> Jack Ye >>>>>> >>>>>> On Thu, Jul 22, 2021 at 6:35 AM Piotr Findeisen < >>>>>> pi...@starburstdata.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> First of all thank you for this discussion and all the view-related >>>>>>> work! >>>>>>> >>>>>>> I agree that solving cross-engine compatibility problem may not be >>>>>>> primary feature today, I am concerned that not thinking about this from >>>>>>> the >>>>>>> start may "tunnel" us into a wrong direction. >>>>>>> Cross-engine compatible views would be such a cool feature that it >>>>>>> is hard to just let it pass. >>>>>>> >>>>>>> My thinking about a smaller IR may be a side-product of me not being >>>>>>> familiar enough with Calcite. >>>>>>> However, with new IR being focused on compatible representation, and >>>>>>> not being tied to anything are actually good things. >>>>>>> For example, we need to focus on JSON representation, but we don't >>>>>>> need to deal with tree traversal or anything, so the code for this >>>>>>> could be >>>>>>> pretty simple. >>>>>>> >>>>>>> > Allow only ANSI-compliant SQL and anything that is truly common >>>>>>> across engines in the view definition (this is how currently Netflix >>>>>>> uses >>>>>>> these 'common' views across Spark and Trino) >>>>>>> >>>>>>> it's interesting. Anjali, do you have means to enforce that, or is >>>>>>> this just a convention? >>>>>>> >>>>>>> What are the common building blocks (relational operations, >>>>>>> constructs and functions) that you found sufficient for expressing your >>>>>>> views? >>>>>>> Being able to enumerate them could help validate various approaches >>>>>>> considered here, including feasibility of dedicated representation. >>>>>>> >>>>>>> >>>>>>> Best, >>>>>>> PF >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 22, 2021 at 2:28 PM Ryan Murray <rym...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hey Anjali, >>>>>>>> >>>>>>>> I am definitely happy to help with implementing 1-3 in your first >>>>>>>> list once the spec has been approved by the community. My hope is that >>>>>>>> the >>>>>>>> final version of the view spec will make it easy to re-use existing >>>>>>>> rollback/time travel/metadata etc functionalities. >>>>>>>> >>>>>>>> Regarding SQL dialects.My personal opinion is: Enforcing >>>>>>>> ANSI-compliant SQL across all engines is hard and probably not >>>>>>>> desirable while storing Calcite makes it hard for eg python to use >>>>>>>> views. A >>>>>>>> project to make a cross language and cross engine IR for sql views and >>>>>>>> the >>>>>>>> relevant transpilers is imho outside the scope of this spec and >>>>>>>> probably >>>>>>>> deserving an apache project of its own. A smaller IR like Piotr >>>>>>>> suggested >>>>>>>> is possible but I feel it will likely quickly snowball into a larger >>>>>>>> project and slow down adoption of the view spec in iceberg. So I think >>>>>>>> the >>>>>>>> most reasonable way forward is to add a dialect field and a warning to >>>>>>>> engines that views are not (yet) cross compatible. This is at odds >>>>>>>> with the >>>>>>>> original spirit of iceberg tables and I wonder how the border community >>>>>>>> feels about it? I would hope that we can make the view spec engine-free >>>>>>>> over time and eventually deprecate the dialect field. >>>>>>>> >>>>>>>> Best, >>>>>>>> Ryan >>>>>>>> >>>>>>>> PS if anyone is interested in collaborating on engine agnostic >>>>>>>> views please reach out. I am keen on exploring this topic. >>>>>>>> >>>>>>>> On Tue, Jul 20, 2021 at 10:51 PM Anjali Norwood >>>>>>>> <anorw...@netflix.com.invalid> wrote: >>>>>>>> >>>>>>>>> Thank you Ryan (M), Piotr and Vivekanand for the comments. I have >>>>>>>>> and will continue to address them in the doc. >>>>>>>>> Great to know about Trino views, Piotr! >>>>>>>>> >>>>>>>>> Thanks to everybody who has offered help with implementation. The >>>>>>>>> spec as it is proposed in the doc has been implemented and is in use >>>>>>>>> at >>>>>>>>> Netflix (currently on Iceberg 0.9). Once we close the spec, we will >>>>>>>>> rebase >>>>>>>>> our code to Iceberg-0.12 and incorporate changes to format and >>>>>>>>> other feedback from the community and should be able to make this MVP >>>>>>>>> implementation available quickly as a PR. >>>>>>>>> >>>>>>>>> A few areas that we have not yet worked on and would love for the >>>>>>>>> community to help are: >>>>>>>>> 1. Time travel on views: Be able to access the view as of a >>>>>>>>> version or time >>>>>>>>> 2. History table: A system table implementation for $versions >>>>>>>>> similar to the $snapshots table in order to display the history of a >>>>>>>>> view >>>>>>>>> 3. Rollback to a version: A way to rollback a view to a previous >>>>>>>>> version >>>>>>>>> 4. Engine agnostic SQL: more below. >>>>>>>>> >>>>>>>>> One comment that is worth a broader discussion is the dialect of >>>>>>>>> the SQL stored in the view metadata. The purpose of the spec is to >>>>>>>>> provide >>>>>>>>> a storage format for view metadata and APIs to access that metadata. >>>>>>>>> The >>>>>>>>> dialect of the SQL stored is an orthogonal question and is outside the >>>>>>>>> scope of this spec. >>>>>>>>> >>>>>>>>> Nonetheless, it is an important concern, so compiling a few >>>>>>>>> suggestions that came up in the comments to continue the discussion: >>>>>>>>> 1. Allow only ANSI-compliant SQL and anything that is truly common >>>>>>>>> across engines in the view definition (this is how currently Netflix >>>>>>>>> uses >>>>>>>>> these 'common' views across Spark and Trino) >>>>>>>>> 2. Add a field to the view metadata to identify the dialect of the >>>>>>>>> SQL. This allows for any desired dialect, but no improved cross-engine >>>>>>>>> operability >>>>>>>>> 3. Store AST produced by Calcite in the view metadata and >>>>>>>>> translate back and forth between engine-supported SQL and AST >>>>>>>>> 4. Intermediate structured language of our own. (What additional >>>>>>>>> functionality does it provide over Calcite?) >>>>>>>>> >>>>>>>>> Given that the view metadata is json, it is easily extendable to >>>>>>>>> incorporate any new fields needed to make the SQL truly compatible >>>>>>>>> across >>>>>>>>> engines. >>>>>>>>> >>>>>>>>> What do you think? >>>>>>>>> >>>>>>>>> regards, >>>>>>>>> Anjali >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jul 20, 2021 at 3:09 AM Piotr Findeisen < >>>>>>>>> pi...@starburstdata.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> FWIW, in Trino we just added Trino views support. >>>>>>>>>> https://github.com/trinodb/trino/pull/8540 >>>>>>>>>> Of course, this is by no means usable by other query engines. >>>>>>>>>> >>>>>>>>>> Anjali, your document does not talk much about compatibility >>>>>>>>>> between query engines. >>>>>>>>>> How do you plan to address that? >>>>>>>>>> >>>>>>>>>> For example, I am familiar with Coral, and I appreciate its >>>>>>>>>> powers for dealing with legacy stuff like views defined by Hive. >>>>>>>>>> I treat it as a great technology supporting transitioning from a >>>>>>>>>> query engine to a better one. >>>>>>>>>> However, I would not base a design of some new system for storing >>>>>>>>>> cross-engine compatible views on that. >>>>>>>>>> >>>>>>>>>> Is there something else we can use? >>>>>>>>>> Maybe the view definition should use some intermediate structured >>>>>>>>>> language that's not SQL? >>>>>>>>>> For example, it could represent logical structure of >>>>>>>>>> operations in semantics manner. >>>>>>>>>> This would eliminate need for cross-engine compatible parsing and >>>>>>>>>> analysis. >>>>>>>>>> >>>>>>>>>> Best >>>>>>>>>> PF >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Jul 20, 2021 at 11:04 AM Ryan Murray <rym...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks Anjali! >>>>>>>>>>> >>>>>>>>>>> I have left some comments on the document. I unfortunately have >>>>>>>>>>> to miss the community meetup tomorrow but would love to chat >>>>>>>>>>> more/help w/ >>>>>>>>>>> implementation. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Ryan >>>>>>>>>>> >>>>>>>>>>> On Tue, Jul 20, 2021 at 7:42 AM Anjali Norwood >>>>>>>>>>> <anorw...@netflix.com.invalid> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> John Zhuge and I would like to propose the following spec for >>>>>>>>>>>> storing view metadata in Iceberg. The proposal has been >>>>>>>>>>>> implemented [1] and >>>>>>>>>>>> is in production at Netflix for over 15 months. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> https://docs.google.com/document/d/1wQt57EWylluNFdnxVxaCkCSvnWlI8vVtwfnPjQ6Y7aw/edit?usp=sharing >>>>>>>>>>>> >>>>>>>>>>>> [1] >>>>>>>>>>>> https://github.com/Netflix/iceberg/tree/netflix-spark-2.4/view/src/main/java/com/netflix/bdp/view >>>>>>>>>>>> >>>>>>>>>>>> Please let us know your thoughts by adding comments to the doc. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Anjali. >>>>>>>>>>>> >>>>>>>>>>> > > -- > Ryan Blue > Tabular > -- John Zhuge