Re: Proposal: Support for views in Iceberg

Ryan Blue Sun, 22 Aug 2021 10:02:40 -0700

Thanks for working on this, Anjali! It’s great to see the thorough
discussion here with everyone.


For the discussion about SQL dialect, I think that the right first step is
to capture the SQL or query dialect. That will give us the most
flexibility. Engines that can use Coral for translation can attempt to
convert and engines that don’t can see if the SQL is valid and can be used.

I think that the idea to create a minimal IR is an interesting one, but can
be added later. We will always need to record the SQL and dialect, even if
we translate to IR because users are going to configure views using SQL.
Uses like showing view history or debugging need to show the original SQL,
plus relevant information like where it was created and the SQL dialect. We
should be able to add this later by adding additional metadata to the view
definition. I don’t think that it would introduce breaking changes to add a
common representation that can be optionally consumed.

Let’s continue talking about a minimal IR, separately. View translation is
a hard problem. Right now, to get views across engines we have to
compromise confidence. IR is a way to have strong confidence, but with
limited expressibility. I think that’s a good trade in a lot of cases and
is worth pursuing, even if it will take a long time.

Jacques makes a great point about types, but I think that the right option
here is to continue using Iceberg types. We’ve already had discussions
about whether Iceberg should support annotating types with engine-specific
ones, so we have a reasonable way to improve this while also providing
compatibility across engines: char(n) is not necessarily supported
everywhere and mapping it to string will make sense in most places. The
schema is primarily used to validate that the data produced by the query
hasn’t changed and that is more about the number of columns in structs and
the names of fields rather than exact types. We can fix up types when
substituting without losing too much: if the SQL produces a varchar(10)
field that the view metadata says is a string, then it’s okay that it is
varchar(10). There is some loss in that we don’t know if it was originally
varchar(5), but I think that this is not going to cause too many issues.
Not all engines will even validate that the schema has not changed, since
it could be valid to use select * from x where ... and allow new fields to
appear.

Right now, I think we should move forward with the proposal as it is, and
pursue type annotations and possibly IR in parallel. Does that sound
reasonable to everyone?

Ryan

On Thu, Jul 29, 2021 at 7:50 AM Piotr Findeisen <pi...@starburstdata.com>
wrote:

> Hi Anjali,
>
> That's a nice summary.
>
> re dialect field. It shouldn't be a bit trouble to have it (or any other
> way to identify application that created the view), and it might be useful.
> Why not make it required from the start?
>
> re "expanded/resolved SQL" -- i don't understand yet what we would put
> there, so cannot comment.
>
> I agree there it's nice to get something out of the door, and I see how
> the current proposal fits some needs already.
> However, i am concerned about the proliferation of non-cross-engine
> compatible views, if we do that.
>
> Also, if we later agree on any compatible approach (portable subset of
> SQL, engine-agnostic IR, etc.), then from the perspective of each engine,
> it would be a breaking change.
> Unless we make the compatible approach as expressive as full power of SQL,
> some views that are possible to create in v1 will not be possible to create
> in v2.
> Thus, if v1  is "some SQL" and v2 is "something awesomely compatible", we
> may not be able to roll it out.
>
> > the convention of common SQL has been working for a majority of users.
> SQL features commonly used are column projections, simple filter
> application, joins, grouping and common aggregate and scalar function. A
> few users occasionally would like to use Trino or Spark specific functions
> but are sometimes able to find a way to use a function that is common to
> both the engines.
>
>
> it's an awesome summary of what constructs are necessary to be able to
> define useful views, while also keep them portable.
>
> To be able to express column projections, simple filter application,
> joins, grouping and common aggregate and scalar function in a structured
> IR, how much effort do you think would be required?
> We didn't really talk about downsides of a structured approach, other than
> it looks complex.
> if we indeed estimate it as a multi-year effort, i wouldn't argue for
> that. Maybe i were overly optimistic though.
>
>
> As Jack mentioned, for engine-specific approach that's not supposed to be
> consumed by multiple engines, we may be better served with approach that's
> outside of Iceberg spec, like https://github.com/trinodb/trino/pull/8540.
>
>
> Best,
> PF
>
>
>
>
>
> On Thu, Jul 29, 2021 at 12:33 PM Anjali Norwood
> <anorw...@netflix.com.invalid> wrote:
>
>> Hi,
>>
>> Thank you for all the comments. I will try to address them all here
>> together.
>>
>>
>>    - @all Cross engine compatibility of view definition: Multiple
>>    options such as engine agnostic SQL or IR of some form have been 
>> mentioned.
>>    We can all agree that all of these options are non-trivial to
>>    design/implement (perhaps a multi-year effort based on the option chosen)
>>    and merit further discussion. I would like to suggest that we continue 
>> this
>>    discussion but target this work for the future (v2?). In v1, we can add an
>>    optional dialect field and an optional expanded/resolved SQL field that 
>> can
>>    be interpreted by engines as they see fit. V1 can unlock many use cases
>>    where the views are either accessed by a single engine or multi-engine use
>>    cases where a (common) subset of SQL is supported. This proposal allows 
>> for
>>    desirable features such as versioning of views and a common format of
>>    storing view metadata while allowing extensibility in the future. *Does
>>    anyone feel strongly otherwise?*
>>    - @Piotr  As for common views at Netflix, the restrictions on SQL are
>>    not enforced, but are advised as best practices. The convention of common
>>    SQL has been working for a majority of users. SQL features commonly used
>>    are column projections, simple filter application, joins, grouping and
>>    common aggregate and scalar function. A few users occasionally would like
>>    to use Trino or Spark specific functions but are sometimes able to find a
>>    way to use a function that is common to both the engines.
>>    - @Jacques and @Jack Iceberg data types are engine agnostic and hence
>>    were picked for storing view schema. Thinking further, the schema field
>>    should be made 'optional', since not all engines require it. (e.g. Spark
>>    does not need it and Trino uses it only for validation).
>>    - @Jacques Table references in the views can be arbitrary objects
>>    such as tables from other catalogs or elasticsearch tables etc. I will
>>    clarify it in the spec.
>>
>> I will work on incorporating all the comments in the spec and make the
>> next revision available for review soon.
>>
>> Regards,
>> Anjali.
>>
>>
>>
>>
>> On Tue, Jul 27, 2021 at 2:51 AM Piotr Findeisen <pi...@starburstdata.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Thanks Jack and Jacques for sharing your thoughts.
>>>
>>> I agree that tracking  dialect/origin is better than nothing.
>>> I think having a Map {dialect: sql} is not going to buy us much.
>>> I.e. it would be useful if there was some external app (or a human
>>> being) that would write those alternative SQLs for each dialect.
>>> Otherwise I am not imagining Spark writing SQL for Spark and Trino, or
>>> Trino writing SQL for Trino and Spark.
>>>
>>> Thanks Jacques for a good summary of SQL supporting options.
>>> While i like the idea of starting with Trino SQL ANTLR grammar file
>>> (it's really well written and resembles spec quite well), you made a good
>>> point that grammar is only part of the problem. Coercions, function
>>> resolution, dereference resolution, table resolution are part of query
>>> analysis that goes beyond just grammar.
>>> In fact, column scoping rules -- while clearly defined by the spec --
>>> may easily differ between engines (pretty usual).
>>> That's why i would rather lean towards some intermediate representation
>>> that is *not *SQL, doesn't require parsing (is already structural), nor
>>> analysis (no scopes! no implicit coercions!).
>>> Before we embark on such a journey, it would be interesting to hear @Martin
>>> Traverso <mar...@starburstdata.com> 's thoughts on feasibility though.
>>>
>>>
>>> Best,
>>> PF
>>>
>>>
>>>
>>>
>>> On Fri, Jul 23, 2021 at 3:27 AM Jacques Nadeau <jacquesnad...@gmail.com>
>>> wrote:
>>>
>>>> Some thoughts...
>>>>
>>>>    - In general, many engines want (or may require) a resolved sql
>>>>    field. This--at minimum--typically includes star expansion since
>>>>    traditional view behavior is stars are expanded at view creation time
>>>>    (since this is the only way to guarantee that the view returns the same
>>>>    logical definition even if the underlying table changes). This may also
>>>>    include a replacement of relative object names to absolute object names
>>>>    based on the session catalog & namespace. If I recall correctly, Hive 
>>>> does
>>>>    both of these things.
>>>>    - It isn't clear in the spec whether the table references used in
>>>>    views are restricted to other Iceberg objects or can be arbitrary 
>>>> objects
>>>>    in the context of a particular engine. Maybe I missed this? For example,
>>>>    can I have a Trino engine view that references an Elasticsearch table
>>>>    stored in an Iceberg view?
>>>>    - Restricting schemas to the Iceberg types will likely lead to
>>>>    unintended consequences. I appreciate the attraction to it but I think 
>>>> it
>>>>    would either create artificial barriers around the types of SQL that are
>>>>    allowed and/or mean that replacing a CTE with a view could potentially
>>>>    change the behavior of the query which I believe violates most typical
>>>>    engine behaviors. A good example of this is the simple sql statement of
>>>>    "SELECT c1, 'foo' as c2 from table1". In many engines (and Calcite by
>>>>    default I believe), c2 will be specified as a CHAR(3). In this Iceberg
>>>>    context, is this view disallowed? If it isn't disallowed then you have 
>>>> an
>>>>    issue where the view schema will be required to be different from a CTE
>>>>    since the engine will resolve it differently than Iceberg. Even if you
>>>>    ignore CHAR(X), you've still got VARCHAR(X) to contend with...
>>>>    - It is important to remember that Calcite is a set of libraries
>>>>    and not a specification. There are things that can be specified in 
>>>> Calcite
>>>>    but in general it doesn't have formal specification as a first 
>>>> principle.
>>>>    It is more implementation as a first principle. This is in contrast to
>>>>    projects like Arrow and Iceberg, which start with well-formed
>>>>    specifications. I've been working with Calcite since before it was an
>>>>    Apache project and I wouldn't recommend adopting it as any form of a
>>>>    specification. On the flipside, I am very supportive of using it for a
>>>>    reference implementation standard for Iceberg view consumption,
>>>>    manipulation, etc.  If anything, I'd suggest we start with the adoption 
>>>> of
>>>>    a relatively clear grammar, e.g. the Antlr grammar file that Spark [1]
>>>>    and/or Trino [2] use. Even that is not a complete specification as 
>>>> grammar
>>>>    must still be interpreted with regards to type promotion, function
>>>>    resolution, consistent unnamed expression naming, etc that aren't 
>>>> defined
>>>>    at the grammar level. I'd definitely avoid using Calcite's JavaCC 
>>>> grammar
>>>>    as it heavily embeds implementation details (in a good way) and relies 
>>>> on
>>>>    some fairly complex logic in the validator and sql2rel components to be
>>>>    fully resolved/comprehended.
>>>>
>>>> Given the above, I suggest having a field which describes the
>>>> dialect(origin?) of the view and then each engine can decide how they want
>>>> to consume/mutate that view (and whether they want to or not). It does risk
>>>> being a dumping ground. Nonetheless, I'd expect the alternative of
>>>> establishing a formal SQL specification to be a similarly long process to
>>>> the couple of years it took to build the Arrow and Iceberg specifications.
>>>> (Realistically, there is far more to specify here than there is in either
>>>> of those two domains.)
>>>>
>>>> Some other notes:
>>>>
>>>>    - Calcite does provide a nice reference document [3] but it is not
>>>>    sufficient to implement what is necessary for 
>>>> parsing/validating/resolving
>>>>    a SQL string correctly/consistently.
>>>>    - Projects like Coral [4] are interesting here but even Coral is
>>>>    based roughly on "HiveQL" which also doesn't have a formal specification
>>>>    process outside of the Hive version you're running. See this thread in
>>>>    Coral slack [5]
>>>>    - ZetaSQL [6] also seems interesting in this space. It feels closer
>>>>    to specification based [7] than Calcite but is much less popular in the 
>>>> big
>>>>    data domain. I also haven't reviewed it's SQL completeness closely, a
>>>>    strength of Calcite.
>>>>    - One of the other problems with building against an implementation
>>>>    as opposed to a specification (e.g. Calcite) is it can make it 
>>>> difficult or
>>>>    near impossible to implement the same algorithms again without a bunch 
>>>> of
>>>>    reverse engineering. If interested in an example of this, see the
>>>>    discussion behind LZ4 deprecation on the Parquet spec [8] for how 
>>>> painful
>>>>    this kind of mistake can become.
>>>>    - I'd love to use the SQL specification itself but nobody actually
>>>>    implements that in its entirety and it has far too many places where 
>>>> things
>>>>    are "implementation-defined" [9].
>>>>
>>>> [1]
>>>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
>>>> [2]
>>>> https://github.com/trinodb/trino/blob/master/core/trino-parser/src/main/antlr4/io/trino/sql/parser/SqlBase.g4
>>>> [3] https://calcite.apache.org/docs/reference.html
>>>> [4] https://github.com/linkedin/coral
>>>> [5] https://coral-sql.slack.com/archives/C01FHBJR20Y/p1608064004073000
>>>> [6] https://github.com/google/zetasql
>>>> [7] https://github.com/google/zetasql/blob/master/docs/one-pager.md
>>>> [8]
>>>> https://github.com/apache/parquet-format/blob/master/Compression.md#lz4
>>>> [9] https://twitter.com/sc13ts/status/1413728808830525440
>>>>
>>>> On Thu, Jul 22, 2021 at 12:01 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>>
>>>>> Did not notice that we are also discussing cross-engine
>>>>> interoperability here, I will add my response in the design doc here.
>>>>>
>>>>> I would personally prefer cross-engine interoperability as a goal and
>>>>> get the spec in the right structure in the initial release, because:
>>>>>
>>>>> 1. I believe that cross-engine compatibility is a critical feature of
>>>>> Iceberg. If I am a user of an existing data lake that already supports
>>>>> views (e.g. Hive), I don't even need Iceberg to have this view feature. I
>>>>> can do what is now done for Trino to use views with Iceberg. I can also
>>>>> just use a table property to indicate the table is a view and store the
>>>>> view SQL as a table property and do my own thing in any query engine to
>>>>> support all the view features. One of the most valuable and unique 
>>>>> features
>>>>> that Iceberg view can unlock is to allow a view to be created in one 
>>>>> engine
>>>>> and read by another. Not supporting cross-engine compatibility feels like
>>>>> losing a lot of value to me.
>>>>>
>>>>> 2. In the view definition, it feels inconsistent to me that we have
>>>>> "schema" as an Iceberg native schema, but "sql" field as the view SQL that
>>>>> can come from any query engine. If the engine already needs to convert the
>>>>> view schema to iceberg shema, it should just do the same for the view SQL.
>>>>>
>>>>> Regarding the way to achieve it, I think it comes to either Apache
>>>>> Calcite (or some other third party alternative I don't know) or our own
>>>>> implementation of some intermediate representation. I don't have a very
>>>>> strong opinion, but my thoughts are the following:
>>>>>
>>>>> 1. Calcite is supposed to be the go-to software to deal with this kind
>>>>> of issue, but my personal concern is that the integration is definitely
>>>>> going to be much more involved, and it will become another barrier for
>>>>> newer engines to onboard because it not only needs to implement Iceberg
>>>>> APIs but also needs Calcite support. It will also start to become a
>>>>> constant discussion around what we maintain and what we should push to
>>>>> Calcite, similar to our situation today with Spark.
>>>>>
>>>>> 2. Another way I am leaning towards, as Piotr also suggested, is to
>>>>> have a native lightweight logical query structure representation of the
>>>>> view SQL and store that instead of the SQL string. We already deal with
>>>>> Expressions in Iceberg, and engines have to convert predicates to Iceberg
>>>>> expressions for predicate pushdown. So I think it would not be hard to
>>>>> extend on that to support this use case. Different engines can build this
>>>>> logical structure when traversing their own AST during a create view 
>>>>> query.
>>>>>
>>>>> 3. With these considerations, I think the "sql" field can potentially
>>>>> be a map (maybe called "engine-sqls"?), where key is the engine type and
>>>>> version like "Spark 3.1", and value is the view SQL string. In this way,
>>>>> the engine that creates the view can still read the SQL directly which
>>>>> might lead to better engine-native integration and avoid redundant 
>>>>> parsing.
>>>>> But in this approach there is always a default intermediate representation
>>>>> it can fallback to when the engine's key is not found in the map. If we
>>>>> want to make incremental progress and delay the design for the 
>>>>> intermediate
>>>>> representation, I think we should at least use this map instead of just a
>>>>> single string.
>>>>>
>>>>> Thanks,
>>>>> Jack Ye
>>>>>
>>>>> On Thu, Jul 22, 2021 at 6:35 AM Piotr Findeisen <
>>>>> pi...@starburstdata.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> First of all thank you for this discussion and all the view-related
>>>>>> work!
>>>>>>
>>>>>> I agree that solving cross-engine compatibility problem may not be
>>>>>> primary feature today, I am concerned that not thinking about this from 
>>>>>> the
>>>>>> start may "tunnel" us into a wrong direction.
>>>>>> Cross-engine compatible views would be such a cool feature that it is
>>>>>> hard to just let it pass.
>>>>>>
>>>>>> My thinking about a smaller IR may be a side-product of me not being
>>>>>> familiar enough with Calcite.
>>>>>> However, with new IR being focused on compatible representation, and
>>>>>> not being tied to anything are actually good things.
>>>>>> For example, we need to focus on JSON representation, but we don't
>>>>>> need to deal with tree traversal or anything, so the code for this could 
>>>>>> be
>>>>>> pretty simple.
>>>>>>
>>>>>> >  Allow only ANSI-compliant SQL and anything that is truly common
>>>>>> across engines in the view definition (this is how currently Netflix uses
>>>>>> these 'common' views across Spark and Trino)
>>>>>>
>>>>>> it's interesting. Anjali, do  you have means to enforce that, or is
>>>>>> this just a convention?
>>>>>>
>>>>>> What are the common building blocks (relational operations,
>>>>>> constructs and functions) that you found sufficient for expressing your
>>>>>> views?
>>>>>> Being able to enumerate them could help validate various approaches
>>>>>> considered here, including feasibility of dedicated representation.
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> PF
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 22, 2021 at 2:28 PM Ryan Murray <rym...@gmail.com> wrote:
>>>>>>
>>>>>>> Hey Anjali,
>>>>>>>
>>>>>>> I am definitely happy to help with implementing 1-3 in your first
>>>>>>> list once the spec has been approved by the community. My hope is that 
>>>>>>> the
>>>>>>> final version of the view spec will make it easy to re-use existing
>>>>>>> rollback/time travel/metadata etc functionalities.
>>>>>>>
>>>>>>> Regarding SQL dialects.My personal opinion is: Enforcing
>>>>>>> ANSI-compliant SQL across all engines is hard and probably not
>>>>>>> desirable while storing Calcite makes it hard for eg python to use 
>>>>>>> views. A
>>>>>>> project to make a cross language and cross engine IR for sql views and 
>>>>>>> the
>>>>>>> relevant transpilers is imho outside the scope of this spec and probably
>>>>>>> deserving an apache project of its own. A smaller IR like Piotr 
>>>>>>> suggested
>>>>>>> is possible but I feel it will likely quickly snowball into a larger
>>>>>>> project and slow down adoption of the view spec in iceberg. So I think 
>>>>>>> the
>>>>>>> most reasonable way forward is to add a dialect field and a warning to
>>>>>>> engines that views are not (yet) cross compatible. This is at odds with 
>>>>>>> the
>>>>>>> original spirit of iceberg tables and I wonder how the border community
>>>>>>> feels about it? I would hope that we can make the view spec engine-free
>>>>>>> over time and eventually deprecate the dialect field.
>>>>>>>
>>>>>>> Best,
>>>>>>> Ryan
>>>>>>>
>>>>>>> PS if anyone is interested in collaborating on engine agnostic views
>>>>>>> please reach out. I am keen on exploring this topic.
>>>>>>>
>>>>>>> On Tue, Jul 20, 2021 at 10:51 PM Anjali Norwood
>>>>>>> <anorw...@netflix.com.invalid> wrote:
>>>>>>>
>>>>>>>> Thank you Ryan (M), Piotr and Vivekanand for the comments. I have
>>>>>>>> and will continue to address them in the doc.
>>>>>>>> Great to know about Trino views, Piotr!
>>>>>>>>
>>>>>>>> Thanks to everybody who has offered help with implementation. The
>>>>>>>> spec as it is proposed in the doc has been implemented and is in use at
>>>>>>>> Netflix (currently on Iceberg 0.9). Once we close the spec, we will 
>>>>>>>> rebase
>>>>>>>> our code to Iceberg-0.12 and incorporate changes to format and
>>>>>>>> other feedback from the community and should be able to make this MVP
>>>>>>>> implementation available quickly as a PR.
>>>>>>>>
>>>>>>>> A few areas that we have not yet worked on and would love for the
>>>>>>>> community to help are:
>>>>>>>> 1. Time travel on views: Be able to access the view as of a version
>>>>>>>> or time
>>>>>>>> 2. History table: A system table implementation for $versions
>>>>>>>> similar to the $snapshots table in order to display the history of a 
>>>>>>>> view
>>>>>>>> 3. Rollback to a version: A way to rollback a view to a previous
>>>>>>>> version
>>>>>>>> 4. Engine agnostic SQL: more below.
>>>>>>>>
>>>>>>>> One comment that is worth a broader discussion is the dialect of
>>>>>>>> the SQL stored in the view metadata. The purpose of the spec is to 
>>>>>>>> provide
>>>>>>>> a storage format for view metadata and APIs to access that metadata. 
>>>>>>>> The
>>>>>>>> dialect of the SQL stored is an orthogonal question and is outside the
>>>>>>>> scope of this spec.
>>>>>>>>
>>>>>>>> Nonetheless, it is an important concern, so compiling a few
>>>>>>>> suggestions that came up in the comments to continue the discussion:
>>>>>>>> 1. Allow only ANSI-compliant SQL and anything that is truly common
>>>>>>>> across engines in the view definition (this is how currently Netflix 
>>>>>>>> uses
>>>>>>>> these 'common' views across Spark and Trino)
>>>>>>>> 2. Add a field to the view metadata to identify the dialect of the
>>>>>>>> SQL. This allows for any desired dialect, but no improved cross-engine
>>>>>>>> operability
>>>>>>>> 3. Store AST produced by Calcite in the view metadata and translate
>>>>>>>> back and forth between engine-supported SQL and AST
>>>>>>>> 4. Intermediate structured language of our own. (What additional
>>>>>>>> functionality does it provide over Calcite?)
>>>>>>>>
>>>>>>>> Given that the view metadata is json, it is easily extendable to
>>>>>>>> incorporate any new fields needed to make the SQL truly compatible 
>>>>>>>> across
>>>>>>>> engines.
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>> regards,
>>>>>>>> Anjali
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jul 20, 2021 at 3:09 AM Piotr Findeisen <
>>>>>>>> pi...@starburstdata.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> FWIW, in Trino we just added Trino views support.
>>>>>>>>> https://github.com/trinodb/trino/pull/8540
>>>>>>>>> Of course, this is by no means usable by other query engines.
>>>>>>>>>
>>>>>>>>> Anjali, your document does not talk much about compatibility
>>>>>>>>> between query engines.
>>>>>>>>> How do you plan to address that?
>>>>>>>>>
>>>>>>>>> For example, I am familiar with Coral, and I appreciate its powers
>>>>>>>>> for dealing with legacy stuff like views defined by Hive.
>>>>>>>>> I treat it as a great technology supporting transitioning from a
>>>>>>>>> query engine to a better one.
>>>>>>>>> However, I would not base a design of some new system for storing
>>>>>>>>> cross-engine compatible views on that.
>>>>>>>>>
>>>>>>>>> Is there something else we can use?
>>>>>>>>> Maybe the view definition should use some intermediate structured
>>>>>>>>> language that's not SQL?
>>>>>>>>> For example, it could represent logical structure of operations in
>>>>>>>>> semantics manner.
>>>>>>>>> This would eliminate need for cross-engine compatible parsing and
>>>>>>>>> analysis.
>>>>>>>>>
>>>>>>>>> Best
>>>>>>>>> PF
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jul 20, 2021 at 11:04 AM Ryan Murray <rym...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Anjali!
>>>>>>>>>>
>>>>>>>>>> I have left some comments on the document. I unfortunately have
>>>>>>>>>> to miss the community meetup tomorrow but would love to chat 
>>>>>>>>>> more/help w/
>>>>>>>>>> implementation.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Ryan
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 20, 2021 at 7:42 AM Anjali Norwood
>>>>>>>>>> <anorw...@netflix.com.invalid> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> John Zhuge and I would like to propose the following spec for
>>>>>>>>>>> storing view metadata in Iceberg. The proposal has been implemented 
>>>>>>>>>>> [1] and
>>>>>>>>>>> is in production at Netflix for over 15 months.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://docs.google.com/document/d/1wQt57EWylluNFdnxVxaCkCSvnWlI8vVtwfnPjQ6Y7aw/edit?usp=sharing
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://github.com/Netflix/iceberg/tree/netflix-spark-2.4/view/src/main/java/com/netflix/bdp/view
>>>>>>>>>>>
>>>>>>>>>>> Please let us know your thoughts by adding comments to the doc.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Anjali.
>>>>>>>>>>>
>>>>>>>>>>

-- 
Ryan Blue
Tabular

Re: Proposal: Support for views in Iceberg

Reply via email to