Re: Materialized Views: Next Steps

Daniel Weeks Mon, 20 May 2024 11:14:44 -0700

I know I'm coming in late here, but I'm still working through all the prior
discussion.  Here are my thoughts so far:


I agree that Jan's doc has some really good context and we should continue
from there, but we should remove discussion of options as it just creates
confusion.  We can reference other material from previous discussions
(mailing lists, other docs/prior version, etc), but the focus should be on
what we feel the final product will look like as opposed to the path it
took to get there.  Let's make sure that the proposal issue
<https://github.com/apache/iceberg/issues/10043> is kept up to date and is
the primary reference point going forward.

As to the question of using properties or updating the view metadata, I
agree with Szehon and others who expressed concern about using properties.
There are a number of issues I see with standardizing around properties in
the table or view definition.  The first is that some of the structures may
get complicated.  I'm specifically concerned about lineage information as
there can be many table references and possibly even view references that
will need to contain additional information.  The second issue is that
table and view properties persist across versions, so rollback and even
handling of those properties would require careful implementation.  The
third issue is that once you've standardized on properties, it is difficult
to change direction and any future changes will likely need to be handled
through properties as well.  This creates an awkward scenario where you're
building a specification around something that isn't particularly well
defined or structured.

I also don't feel there's a significant difference in the effort involved
in updating the metadata representations because we need to define the same
properties regardless (most of the work is around the handling, not how the
values are defined).  While using properties may seem expedient, I don't
think it's really going to move things along much faster and will be more
difficult in the long-term.  The one caveat is that I believe we may be
able to limit the changes to the view spec and avoid needing a table spec
update.  I've added my comments to the doc to that effect.

-Dan



On Fri, May 17, 2024 at 3:16 PM Walaa Eldin Moustafa <wa.moust...@gmail.com>
wrote:

> I am not in favor of expanding the spec for use cases that do not directly
> serve materialized views. Identifying general lineage is a separate problem
> that is also applicable to non-materialized views so maybe that’s worth
> discussing in a separate spec. If there is a use case for timestamp or
> snapshot level properties for materialize views, we can discuss them but so
> far I feel they are redundant. What do you think?
>
> Thanks,
> Walaa.
>
> On Fri, May 17, 2024 at 3:05 PM Benny Chow <btc...@gmail.com> wrote:
>
>> I think it’s still worthwhile to include the snapshot and timestamp refs
>> for completeness sake.
>>
>> Also, Jan brought up interesting use case with BI tool using the MV
>> without SQL representation.  The BI tool can get all table and view
>> dependencies if the lineage is complete.
>>
>> Thanks
>>
>>
>> On May 17, 2024, at 1:35 PM, Walaa Eldin Moustafa <wa.moust...@gmail.com>
>> wrote:
>>
>> 
>>
>> Sounds good. I am assuming we agree it is not required for either
>> snapshot or timestamp?
>>
>> Thanks,
>> Walaa.
>>
>>
>> On Fri, May 17, 2024 at 1:17 PM Benny Chow <btc...@gmail.com> wrote:
>>
>>> I like Jack's suggestions to capture the ref type and value!  When the
>>> ref type is branch, the snapshot id is dynamic and so the engine using the
>>> MV can validate that the latest snapshot on a branch matches the branch
>>> snapshot at the time of materialization.
>>>
>>> I think if we do this then we don't need to precisely identify the same
>>> table (at different snapshots) in the MV's query tree.  So, we don't need
>>> to capture any additional properties like alias, parent view, path to root,
>>> sequence number, etc.
>>>
>>> Thanks
>>> Benny
>>>
>>> On Fri, May 17, 2024 at 11:20 AM Walaa Eldin Moustafa <
>>> wa.moust...@gmail.com> wrote:
>>>
>>>> Thanks Jack, and welcome back!
>>>>
>>>> Taking a step back, I understand the initial concern was that a table
>>>> name (e.g., t1 in your example) would be referenced multiple times in the
>>>> view definition and each reference is associated with a different snapshot
>>>> ID, hence UUID is not sufficient to capture each occurrence/reference. I
>>>> proposed:
>>>>
>>>> * The solution to track unique occurrences is to use something along
>>>> the lines of the SQL alias (e.g., "t1" for the first occurrence and "t2"
>>>> from "as t2" in your example) to uniquely identify each occurrence -- we
>>>> can tweak the representation and explore how to handle this in case of
>>>> nested queries, etc, but alias is the main concept to track uniqueness.
>>>> * However, since this leads to a series of open ended problems, I have
>>>> also suggested avoiding this complexity altogether and not supporting time
>>>> travel in MVs for now.
>>>>
>>>> However, thinking again, are not time travel queries in MVs
>>>> self-containing the exact snapshot ID that we are trying to track in the
>>>> properties? Looks like this information is already encoded in the query and
>>>> there is no need to capture it externally.
>>>>
>>>> For example, if the MV definition consists of table references where
>>>> all of the references are bound to specific snapshot IDs or timestamps,
>>>> then the storage table is always fresh no matter if the underlying tables
>>>> change. Tracking snapshot IDs in the storage table is only required for
>>>> table references that are not pinned to a specific snapshot ID/timestamp in
>>>> the view definition, for which UUID is sufficient.
>>>>
>>>> Thanks,
>>>> Walaa.
>>>>
>>>>
>>>> On Fri, May 17, 2024 at 9:51 AM Jack Ye <yezhao...@gmail.com> wrote:
>>>>
>>>>> Hi everyone, just want to say I am back from the leave, and currently
>>>>> catching up with the threads, I will make more comments later after 
>>>>> knowing
>>>>> more details of what has been going on. Looks like we've made great
>>>>> progress!
>>>>>
>>>>> Just my 2 cents on the current properties vs metadata field
>>>>> discussion. The proposed properties are:
>>>>> - in view:
>>>>>   1. a boolean flag to indicate a view is a MV
>>>>>   2. a pointer to the storage table
>>>>> - in storage table:
>>>>>   3. view version that is materialized
>>>>>   4. a prefix-based map to describe the snapshot version of the base
>>>>> tables that are materialized
>>>>>   5. a prefix-based map to describe the version of child views that
>>>>> are materialized
>>>>>
>>>>> For 1, 2, and 3, these are all pretty simple and can be just
>>>>> properties. I guess 4 and 5 are the main ones that seem complex and can be
>>>>> more formalized as metadata fields. I think the time travel cases Bunny
>>>>> brought up might be good ones to look into more details:
>>>>>
>>>>> For direct version travel, I think the base table version serves as
>>>>> the default. If you have a MV query like
>>>>>
>>>>> SELECT * FROM
>>>>>   t1,
>>>>>   t1 FOR SYSTEM_VERSION AS OF 987654 as t2
>>>>>   WHERE t1.c1 = t2.c1
>>>>>
>>>>> and in the storage table it says t1 maps to snapshot id 123456, then
>>>>> the query is still not ambiguous, it should be interpreted as
>>>>>
>>>>> SELECT * FROM
>>>>>   t1 FOR SYSTEM_VERSION AS OF 123456,
>>>>>   t1 FOR SYSTEM_VERSION AS OF 987654 as t2
>>>>>   WHERE t1.c1 = t2.c1
>>>>>
>>>>> For ref travel, the specific ref version needs to be fixed at MV
>>>>> creation time:
>>>>>
>>>>> SELECT * FROM
>>>>>   t1,
>>>>>   t1 FOR SYSTEM_VERSION AS OF '2024-Q1' as t2
>>>>>   WHERE t1.c1 = t2.c1
>>>>>
>>>>> Just storing table UUID is not sufficient. In a property-based
>>>>> approach, we need something like
>>>>> base.table.<table>.ref.<ref-name>=<snapshot-id>.
>>>>>
>>>>> Time travel is similar to ref travel:
>>>>>
>>>>> SELECT * FROM
>>>>>   t1,
>>>>>   t1 FOR SYSTEM_TIME AS OF timestamp '2024-01-01' as t2
>>>>>   WHERE t1.c1 = t2.c1
>>>>>
>>>>> In a property-based approach, we need something like
>>>>> base.table.<table>.time.<timestamp>=<snapshot-id>.
>>>>>
>>>>> Technically this is indeed getting increasingly complex, so I can get
>>>>> why many of us say this property-based approach is quite brittle. However,
>>>>> it seems like it can still work as we extend the property structure.
>>>>> Personally speaking I am leaning more towards the property-based approach
>>>>> just for its simplicity, but I need to think more about other use cases as
>>>>> well.
>>>>>
>>>>> Best,
>>>>> Jack Ye
>>>>>
>>>>>
>>>>> On Thu, May 16, 2024 at 10:21 PM Walaa Eldin Moustafa <
>>>>> wa.moust...@gmail.com> wrote:
>>>>>
>>>>>> I think this is orthogonal to the property vs metadata field since
>>>>>> instead of representing the property as `base.table.[UUID]` it would be
>>>>>> something like `base.table.[alias]` where `alias` is the specific
>>>>>> occurrence of the table in the query according to its alias (and SELECT
>>>>>> scope possibly, which kind of opens the door to further complexities, but
>>>>>> for the sake of argument -- there is a mapping to properties too).
>>>>>>
>>>>>> Another question: assuming we go with the top level metadata model,
>>>>>> will we still expose this metadata on the engine side as properties? What
>>>>>> would the property names be?
>>>>>>
>>>>>> Thanks,
>>>>>> Walaa.
>>>>>>
>>>>>>
>>>>>> On Thu, May 16, 2024 at 9:55 PM Benny Chow <btc...@gmail.com> wrote:
>>>>>>
>>>>>>> Sounds good.
>>>>>>>
>>>>>>> Another benefit of the struct model is that it's more extensible in
>>>>>>> the future when we need to disambiguate the same table that appears
>>>>>>> multiple times in the MV query tree.
>>>>>>> This could happen with time travel queries or branching.  We may end
>>>>>>> up adding additional properties like a sequence number, parent view or 
>>>>>>> path
>>>>>>> to root.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> On Thu, May 16, 2024 at 3:57 PM Walaa Eldin Moustafa <
>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Benny, I have responded to the comment.
>>>>>>>>
>>>>>>>> I would suggest that we use this thread to evaluate properties
>>>>>>>> model vs top level metadata model (to avoid discussion drift).
>>>>>>>>
>>>>>>>> If we have feedback on the actual properties used in the properties
>>>>>>>> model as defined in the PR, we can have the discussion there.
>>>>>>>>
>>>>>>>> THanks,
>>>>>>>> Walaa.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, May 16, 2024 at 3:22 PM Benny Chow <btc...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Walaa
>>>>>>>>>
>>>>>>>>> I left comments in your spec PR:
>>>>>>>>> https://github.com/apache/iceberg/pull/10280#pullrequestreview-2061922169
>>>>>>>>>  My last question about use cases was really about incremental 
>>>>>>>>> refresh with
>>>>>>>>> aggregates.  But I think this might be too complicated to try to
>>>>>>>>> model/discuss now and so I agree with Micah's comment about doing it 
>>>>>>>>> in a
>>>>>>>>> future iteration.
>>>>>>>>>
>>>>>>>>> Hi Jan,
>>>>>>>>>
>>>>>>>>> Regarding storing the identifiers, I like the idea too.  Dremio's
>>>>>>>>> query engine supports MVs on sources besides Iceberg tables.  Here's
>>>>>>>>> everything that's in a single lineage entry:
>>>>>>>>> https://github.com/dremio/dremio-oss/blob/master/services/accelerator/src/main/protobuf/reflection.proto#L80
>>>>>>>>> The lineage is stored as a graph and not a list of entries.  I think 
>>>>>>>>> for
>>>>>>>>> what we are trying to achieve, it's more practical to limit the 
>>>>>>>>> lineage to
>>>>>>>>> Iceberg sources.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Benny
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, May 15, 2024 at 12:06 AM Jan Kaul
>>>>>>>>> <jank...@mailbox.org.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> I agree with Szehon and Benny that storing the lineage
>>>>>>>>>> information as multiple table properties is too brittle, especially 
>>>>>>>>>> for
>>>>>>>>>> many source tables (base tables). I would prefer to have the whole 
>>>>>>>>>> lineage
>>>>>>>>>> information in one entry as it is more concise. This is also how 
>>>>>>>>>> Trino has
>>>>>>>>>> been doing it, as you can see here
>>>>>>>>>> <https://github.com/trinodb/trino/blob/212455d3e1d393f58cbc395d2b9da47ed8f23dd8/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java#L2915>
>>>>>>>>>> .
>>>>>>>>>>
>>>>>>>>>> As we've discussed in the google doc
>>>>>>>>>> <https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit#heading=h.60qmzug7bzxc>:
>>>>>>>>>> it is helpful to also store the table identifiers of the source 
>>>>>>>>>> tables to
>>>>>>>>>> enable clients to determine the freshness of the MV that don't 
>>>>>>>>>> understand
>>>>>>>>>> the SQL dialect of the MV definition, like other query engines, BI 
>>>>>>>>>> tools
>>>>>>>>>> and Dataframe libraries. This is also how Trino is doing it. That's 
>>>>>>>>>> why we
>>>>>>>>>> chose the design in the google doc.
>>>>>>>>>>
>>>>>>>>>> Storing the storage table identifier as a property might work.
>>>>>>>>>>
>>>>>>>>>> Thanks, Jan
>>>>>>>>>> On 15.05.24 02:38, Walaa Eldin Moustafa wrote:
>>>>>>>>>>
>>>>>>>>>> Thanks Benny. My specific thoughts about the spec and the
>>>>>>>>>> properties are captured in the spec PR
>>>>>>>>>> https://github.com/apache/iceberg/pull/10280. The spec is also
>>>>>>>>>> implemented in the Spark implementation PR
>>>>>>>>>> https://github.com/apache/iceberg/pull/9830, and I believe this
>>>>>>>>>> follows the same nature of how the information was captured in 
>>>>>>>>>> Netflix's
>>>>>>>>>> implementation with Spark, and Trino implementation (prior to 
>>>>>>>>>> formalizing
>>>>>>>>>> through that spec), both of which have been used reliably for years. 
>>>>>>>>>> I
>>>>>>>>>> think it also aligns with Ryan's feedback here
>>>>>>>>>> https://github.com/apache/iceberg/issues/6420#issuecomment-1369280546
>>>>>>>>>>  which
>>>>>>>>>> indicated the usage of properties.
>>>>>>>>>>
>>>>>>>>>> The reasons for choosing properties:
>>>>>>>>>> * Not every table is a storage table and not every view is a
>>>>>>>>>> materialized view. I feel exposing the info as top level metadata is 
>>>>>>>>>> an
>>>>>>>>>> overkill for the original object type.
>>>>>>>>>> * The properties are simple. They contain either single snapshot
>>>>>>>>>> ID each, or single view version each, or lastly, the storage table
>>>>>>>>>> identifier. Engines can use them without issues (also as shown in the
>>>>>>>>>> implementation).
>>>>>>>>>> * To be meaningful, the metadata fields should be captured in the
>>>>>>>>>> engine API as well, which is an effort that has to be pursued 
>>>>>>>>>> outside the
>>>>>>>>>> Iceberg community. Until engine APIs evolve, we would have to define 
>>>>>>>>>> a
>>>>>>>>>> mapping between Iceberg metadata fields and engine properties (only 
>>>>>>>>>> current
>>>>>>>>>> place in engine side to capture this info). This requires an 
>>>>>>>>>> additional
>>>>>>>>>> spec on its own, and it will introduce complexities. Hence it is 
>>>>>>>>>> always
>>>>>>>>>> cleaner to map Iceberg properties to engine properties and Iceberg 
>>>>>>>>>> metadata
>>>>>>>>>> to designated engine APIs. Mixing and matching (e.g., Iceberg 
>>>>>>>>>> metadata
>>>>>>>>>> fields as engine properties) just for the lack of other cleaner 
>>>>>>>>>> options
>>>>>>>>>> does not sound like a good idea in both short and long term.
>>>>>>>>>>
>>>>>>>>>> Let me know your thoughts.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Walaa.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, May 14, 2024 at 5:12 PM Benny Chow <btc...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I agree with Szheon here.  I think storing the materialization
>>>>>>>>>>> lineage as a bunch of properties is brittle.  This lineage 
>>>>>>>>>>> information is
>>>>>>>>>>> needed by engines to validate the staleness of a materialization 
>>>>>>>>>>> and also
>>>>>>>>>>> to perform full or incremental refreshes.  There’s a lot to capture 
>>>>>>>>>>> here.
>>>>>>>>>>>
>>>>>>>>>>> Maybe we should drill down into the use cases first - such as
>>>>>>>>>>> incremental refresh with aggregates?  (Pick a harder one first 😀)
>>>>>>>>>>>
>>>>>>>>>>> I left a few comments about this in the doc.  I wonder what are
>>>>>>>>>>> your thoughts here Walaa?
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>> On May 14, 2024, at 4:20 PM, Walaa Eldin Moustafa <
>>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> 
>>>>>>>>>>> Thanks John. The current metadata does not sound complex. We
>>>>>>>>>>> need to track the underlying table snapshot IDs as well as the view 
>>>>>>>>>>> version
>>>>>>>>>>> ID. I agree as long as it is simple and before this feature fully 
>>>>>>>>>>> matures,
>>>>>>>>>>> we should track it in properties.
>>>>>>>>>>>
>>>>>>>>>>> One important factor for me (apart from the API effort,
>>>>>>>>>>> especially on the engine side), is that not each table is an MV 
>>>>>>>>>>> storage
>>>>>>>>>>> table. Surfacing MV-specific storage table properties as first 
>>>>>>>>>>> class table
>>>>>>>>>>> metadata sounds to impose this metadata on every table, when it is 
>>>>>>>>>>> not
>>>>>>>>>>> required for normal table operation (yes, they can be optional, but 
>>>>>>>>>>> it does
>>>>>>>>>>> not sound like the use case warrants exposing them as metadata 
>>>>>>>>>>> fields yet).
>>>>>>>>>>>
>>>>>>>>>>> Similarly, since not every view is a materialized view, it
>>>>>>>>>>> sounds reasonable to keep MV-specific data in properties.
>>>>>>>>>>>
>>>>>>>>>>> UUID (for views), on the other hand, is common to all views,
>>>>>>>>>>> hence it made sense to add it as a top level field.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Walaa.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, May 14, 2024 at 1:01 PM John Zhuge <jzh...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Szheon,
>>>>>>>>>>>>
>>>>>>>>>>>> While I fully share your concern of abusing table properties,
>>>>>>>>>>>> we took the approach of option 1 and run it in production for 
>>>>>>>>>>>> several years:
>>>>>>>>>>>>
>>>>>>>>>>>>    - the feature was still evolving
>>>>>>>>>>>>    - quick and simple implementation
>>>>>>>>>>>>    - table properties are simple enough and not confusing
>>>>>>>>>>>>    - haven't seen an urgent need to convert the properties to
>>>>>>>>>>>>    metadata fields and add API
>>>>>>>>>>>>    - do not wish on-disk changes (requiring lengthy
>>>>>>>>>>>>    tedious migration)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> That said, I am open to codifying the mv metadata into api and
>>>>>>>>>>>> spec, with the following considerations
>>>>>>>>>>>>
>>>>>>>>>>>>    - mv metadata has reached maturity and consensus (could be
>>>>>>>>>>>>    just a core portion)
>>>>>>>>>>>>    - when mv metadata becomes too complex
>>>>>>>>>>>>    - wish to support use cases that are quicker to adopt API
>>>>>>>>>>>>    changes (than engines), e.g., using Iceberg library to 
>>>>>>>>>>>> manipulate MVs, or
>>>>>>>>>>>>    parsing metadata files directly
>>>>>>>>>>>>    - Spark view catalog API can evolve separately from Iceberg
>>>>>>>>>>>>    API and spec changes
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks all for the great discussion!
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, May 10, 2024 at 10:48 PM Walaa Eldin Moustafa <
>>>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Szheon,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the follow-up. It is possible some of the concerns
>>>>>>>>>>>>> were referring to the backend catalogs, but it is all connected. 
>>>>>>>>>>>>> My main
>>>>>>>>>>>>> personal concern is from the engine connector APIs point of view, 
>>>>>>>>>>>>> but I
>>>>>>>>>>>>> share the concern about the catalogs.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think everyone's concern is not about the complexity* per*
>>>>>>>>>>>>> backend catalog/engine catalog API (in which case adding new 
>>>>>>>>>>>>> metadata to
>>>>>>>>>>>>> existing objects could require less code), but rather about the
>>>>>>>>>>>>> *number* of APIs and implementations that need to change (in
>>>>>>>>>>>>> which case both new metadata to existing objects and new objects 
>>>>>>>>>>>>> altogether
>>>>>>>>>>>>> introduce equal complexity).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Walaa.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, May 10, 2024 at 10:31 AM Szehon Ho <
>>>>>>>>>>>>> szehon.apa...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Walaa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> OK thanks for confirming.  I am still not 100% in agreement,
>>>>>>>>>>>>>> my understanding of the rationale for separate Table/View 
>>>>>>>>>>>>>> objects in the
>>>>>>>>>>>>>> comment that you linked:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think the biggest problem with this is that we would need
>>>>>>>>>>>>>>> to modify every catalog to support this combination and that 
>>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>>> really difficult.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> is about JavaCatalogs /REST Catalog needing to to support
>>>>>>>>>>>>>> creating , persisting, and loading a MaterializedView object, 
>>>>>>>>>>>>>> which is much
>>>>>>>>>>>>>> more complex.  See HiveView PR for example :
>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/9852   We would have
>>>>>>>>>>>>>> to do the same exercise for persisting MV.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In our case though, there's not much complexity regardless of
>>>>>>>>>>>>>> approach ('properties' or new metadata fields), in terms of Java
>>>>>>>>>>>>>> Catalog/REST Catalog.  It's mostly pass-through to storage.  
>>>>>>>>>>>>>> Looks like you
>>>>>>>>>>>>>> are referring to Spark's View model in terms of complexity, 
>>>>>>>>>>>>>> which may be a
>>>>>>>>>>>>>> different story, but not sure if it is a good rationale to make 
>>>>>>>>>>>>>> Iceberg to
>>>>>>>>>>>>>> use 'properties' .
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 'properties'  is for read/write configurations, not to save
>>>>>>>>>>>>>> metadatas.  To me, its also brittle to save important metadata, 
>>>>>>>>>>>>>> as it's not
>>>>>>>>>>>>>> in the defined schema.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> A string to string map of table properties. This is used to
>>>>>>>>>>>>>>> control settings that affect reading and writing and is not 
>>>>>>>>>>>>>>> intended to be
>>>>>>>>>>>>>>> used for arbitrary metadata.  For example,
>>>>>>>>>>>>>>> commit.retry.num-retries is used to control the number of
>>>>>>>>>>>>>>> commit retries.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On the other hand, the Draft Spec suggests to save `lineage`
>>>>>>>>>>>>>> as a modeled field on the Storage Table's snapshot metadata.  
>>>>>>>>>>>>>> This allows
>>>>>>>>>>>>>> you to 'time travel', 'branch', and have this metadata life cycle
>>>>>>>>>>>>>> integrated via normal snapshots lifecycle operations.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So that's my rationale.  Not sure if we can come to an
>>>>>>>>>>>>>> agreement over email though, and may need others to chime in as 
>>>>>>>>>>>>>> well.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> Szehon
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, May 9, 2024 at 11:58 PM Walaa Eldin Moustafa <
>>>>>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Szehon,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, you are reading the PR correctly, and interpreting the
>>>>>>>>>>>>>>> meaning of properties correctly. I think the reply you pasted 
>>>>>>>>>>>>>>> from Ryan
>>>>>>>>>>>>>>> refers to the same concept as well.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For the initial Google doc and the issue (by the way it is
>>>>>>>>>>>>>>> an issue, not a PR), yes both are proposing new metadata fields.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The references I made to the modeling doc [1, 2] are reasons
>>>>>>>>>>>>>>> why new APIs are not desired. The cons/concerns applicable to 
>>>>>>>>>>>>>>> new MV
>>>>>>>>>>>>>>> metadata apply by extension to new table and view metadata 
>>>>>>>>>>>>>>> fields.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The reason why new metadata adds complexity is that this new
>>>>>>>>>>>>>>> metadata needs to be propagated to the engine API. For example, 
>>>>>>>>>>>>>>> here is the
>>>>>>>>>>>>>>> ViewInfo [3] class in the Spark catalog, which is used in view 
>>>>>>>>>>>>>>> methods like
>>>>>>>>>>>>>>> createView. Its fields correspond with the Iceberg metadata. 
>>>>>>>>>>>>>>> Adding new
>>>>>>>>>>>>>>> Iceberg fields should be accompanied with new fields in the 
>>>>>>>>>>>>>>> engine
>>>>>>>>>>>>>>> catalog/connector APIs, which was a major reason for rejecting 
>>>>>>>>>>>>>>> the combined
>>>>>>>>>>>>>>> MV object model as well.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>> https://docs.google.com/document/d/1zg0wQ5bVKTckf7-K_cdwF4mlRi6sixLcyEh6jErpGYY/edit?pli=1&disco=AAABK7e3QB4
>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>> https://docs.google.com/document/d/1zg0wQ5bVKTckf7-K_cdwF4mlRi6sixLcyEh6jErpGYY/edit?pli=1&disco=AAABIonvCGE
>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>> https://github.com/apache/spark/blob/2df494fd4e4e64b9357307fb0c5e8fc1b7491ac3/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewInfo.java#L45
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, May 9, 2024 at 11:30 PM Szehon Ho <
>>>>>>>>>>>>>>> szehon.apa...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Walaa
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As there may be confusion in the word 'properties', I want
>>>>>>>>>>>>>>>> to double check if we are talking about the same thing here.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am reading your PR as adding lineage metadata as new
>>>>>>>>>>>>>>>> key/value pair under the storage Table's 'properties' field:
>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/blob/main/format/spec.md?plain=1#L677
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *optional* *optional* *properties* A string to string map
>>>>>>>>>>>>>>>> of table properties. This is used to control settings that 
>>>>>>>>>>>>>>>> affect reading
>>>>>>>>>>>>>>>> and writing and is not intended to be used for arbitrary 
>>>>>>>>>>>>>>>> metadata. For
>>>>>>>>>>>>>>>> example, commit.retry.num-retries is used to control the
>>>>>>>>>>>>>>>> number of commit retries.
>>>>>>>>>>>>>>>> and adding Storage Table pointer as key/value pair in the
>>>>>>>>>>>>>>>> View's 'properties' field:
>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/blob/main/format/view-spec.md?plain=1#L65
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *optional* properties A string to string map of view
>>>>>>>>>>>>>>>> properties [2]
>>>>>>>>>>>>>>>> Is that correct?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On the other hand, I was talking about adding this metadata
>>>>>>>>>>>>>>>> as actual fields, as is described in the Draft Spec of the 
>>>>>>>>>>>>>>>> Design Doc
>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A
>>>>>>>>>>>>>>>>  and
>>>>>>>>>>>>>>>> first PR https://github.com/apache/iceberg/issues/6420 .
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Do you mean, the vote means we cannot model new fields like
>>>>>>>>>>>>>>>> 'materialization' and 'lineage' as was proposed there ?    If 
>>>>>>>>>>>>>>>> that is the
>>>>>>>>>>>>>>>> interpretation, I am not sure I agree.  I dont fully see how 
>>>>>>>>>>>>>>>> new fields
>>>>>>>>>>>>>>>> adds more catalog implementation complexity over new key/value 
>>>>>>>>>>>>>>>> properties?
>>>>>>>>>>>>>>>> To me, the vote seemed to just rule out using a combined 
>>>>>>>>>>>>>>>> catalog object
>>>>>>>>>>>>>>>> (MaterializedView) in favor of re-using the Table and View 
>>>>>>>>>>>>>>>> metadata models,
>>>>>>>>>>>>>>>> not to prevent change to the Table and View model.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>> Szehon
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, May 9, 2024 at 10:05 PM Walaa Eldin Moustafa <
>>>>>>>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Szehon,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think choosing separate view + table objects precludes
>>>>>>>>>>>>>>>>> us from adding new metadata to table and view metadata. Here 
>>>>>>>>>>>>>>>>> is one
>>>>>>>>>>>>>>>>> relevant comment [1] from Ryan on the modeling doc, where his 
>>>>>>>>>>>>>>>>> point is that
>>>>>>>>>>>>>>>>> we want to avoid introducing new APIs since it requires 
>>>>>>>>>>>>>>>>> updating every
>>>>>>>>>>>>>>>>> catalog, and (quoting) even now, we have few implementations 
>>>>>>>>>>>>>>>>> that support
>>>>>>>>>>>>>>>>> views because of the problems updating back ends. Therefore, 
>>>>>>>>>>>>>>>>> one of the
>>>>>>>>>>>>>>>>> major reasons to avoid a new model with new metadata is to 
>>>>>>>>>>>>>>>>> avoid adding new
>>>>>>>>>>>>>>>>> metadata, which introduces this complexity. Here is another 
>>>>>>>>>>>>>>>>> similar comment
>>>>>>>>>>>>>>>>> from Renjie [2] on the cons listed for the combined object 
>>>>>>>>>>>>>>>>> approach.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Even Ryan's point on the MV issue that you referenced
>>>>>>>>>>>>>>>>> reads to me as he is supportive of the property model. Here 
>>>>>>>>>>>>>>>>> are some quotes:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > We would still want some MV metadata in table
>>>>>>>>>>>>>>>>> *properties*.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > I recommend instead reusing the existing snapshot
>>>>>>>>>>>>>>>>> metadata structure to store what you need as snapshot 
>>>>>>>>>>>>>>>>> *properties*.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > First, I think we want to avoid keeping much state
>>>>>>>>>>>>>>>>> information in complex table *properties*.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Again, here, he is supportive of table properties, but
>>>>>>>>>>>>>>>>> wants to make sure that the information is simple.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > We may want additional metadata as well, like a UUID to
>>>>>>>>>>>>>>>>> ensure we have the right view. I don't think we have a UUID 
>>>>>>>>>>>>>>>>> in the view
>>>>>>>>>>>>>>>>> spec yet, but we could add one.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Here, he is very specific when it comes to new metadata
>>>>>>>>>>>>>>>>> fields, and explicitly calls it out. That is the only new 
>>>>>>>>>>>>>>>>> metadata field in
>>>>>>>>>>>>>>>>> that reply and by now it is already supported. It is also not 
>>>>>>>>>>>>>>>>> MV-specific.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hope this addresses your question on the property vs new
>>>>>>>>>>>>>>>>> metadata model.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1zg0wQ5bVKTckf7-K_cdwF4mlRi6sixLcyEh6jErpGYY/edit?pli=1&disco=AAABK7e3QB4
>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1zg0wQ5bVKTckf7-K_cdwF4mlRi6sixLcyEh6jErpGYY/edit?pli=1&disco=AAABIonvCGE
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, May 9, 2024 at 5:49 PM Szehon Ho <
>>>>>>>>>>>>>>>>> szehon.apa...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Walaa,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I agree, I definitely do not want yet another pr/doc
>>>>>>>>>>>>>>>>>> where discussion happens. as its already quite spread out :) 
>>>>>>>>>>>>>>>>>>  But did not
>>>>>>>>>>>>>>>>>> want to clarify some points before we get started on the 
>>>>>>>>>>>>>>>>>> discussion on your
>>>>>>>>>>>>>>>>>> PR.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> With reusing the table and view objects, we are not
>>>>>>>>>>>>>>>>>>> changing the existing metadata of either table or view spec 
>>>>>>>>>>>>>>>>>>> but rather
>>>>>>>>>>>>>>>>>>> introduce new properties and formalize them to express 
>>>>>>>>>>>>>>>>>>> materialized views
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On this point, I am not 100% sure that choosing to
>>>>>>>>>>>>>>>>>> represent a MaterializedView as a separate View + Table 
>>>>>>>>>>>>>>>>>> object precludes us
>>>>>>>>>>>>>>>>>> from adding to metadata of Table or View as the Draft Spec 
>>>>>>>>>>>>>>>>>> suggested,
>>>>>>>>>>>>>>>>>> though.  I think this point was discussed in Jan's initial 
>>>>>>>>>>>>>>>>>> PR with a good
>>>>>>>>>>>>>>>>>> point from Ryan:
>>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/6420#issuecomment-1369280546
>>>>>>>>>>>>>>>>>>  that
>>>>>>>>>>>>>>>>>> using Table Properties to track lineage is fairly brittle, 
>>>>>>>>>>>>>>>>>> and having it
>>>>>>>>>>>>>>>>>> formalized in the Iceberg metadata is cleaner, and that was 
>>>>>>>>>>>>>>>>>> thus the
>>>>>>>>>>>>>>>>>> direction of the Draft Spec in the design doc.  What do 
>>>>>>>>>>>>>>>>>> people think?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>> Szehon
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, May 9, 2024 at 5:35 PM Walaa Eldin Moustafa <
>>>>>>>>>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks Szehon.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The reason for the difference is that the proposal in
>>>>>>>>>>>>>>>>>>> the Google doc is based on a new MV model, hence, new 
>>>>>>>>>>>>>>>>>>> metadata fields and a
>>>>>>>>>>>>>>>>>>> new metadata model were being introduced (with types, 
>>>>>>>>>>>>>>>>>>> optionality, etc).
>>>>>>>>>>>>>>>>>>> With reusing the table and view objects, we are not 
>>>>>>>>>>>>>>>>>>> changing the existing
>>>>>>>>>>>>>>>>>>> metadata of either table or view spec but rather introduce 
>>>>>>>>>>>>>>>>>>> new properties
>>>>>>>>>>>>>>>>>>> and formalize them to express materialized views. This 
>>>>>>>>>>>>>>>>>>> would be the answer
>>>>>>>>>>>>>>>>>>> to most of the questions you posted on the PR (besides some 
>>>>>>>>>>>>>>>>>>> naming
>>>>>>>>>>>>>>>>>>> questions, which I think should be straightforward).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> With that fundamental difference, we cannot lift and
>>>>>>>>>>>>>>>>>>> shift what is in the doc to any PR. Further, having 
>>>>>>>>>>>>>>>>>>> consensus on separate
>>>>>>>>>>>>>>>>>>> table and view objects contradicts with the point being 
>>>>>>>>>>>>>>>>>>> made on having
>>>>>>>>>>>>>>>>>>> consensus on the doc. We might have had agreements on some 
>>>>>>>>>>>>>>>>>>> elements, but
>>>>>>>>>>>>>>>>>>> definitely not on the whole doc, proven by the follow ups 
>>>>>>>>>>>>>>>>>>> (also as a
>>>>>>>>>>>>>>>>>>> community, not individuals).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Therefore: we need a new space to discuss the separate
>>>>>>>>>>>>>>>>>>> table and view properties.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Is the question whether to:
>>>>>>>>>>>>>>>>>>> 1- Create a new doc
>>>>>>>>>>>>>>>>>>> 2- Create a new PR?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I feel a PR is the most effective way, especially given
>>>>>>>>>>>>>>>>>>> the fact that we discussed the topic a lot by now. If we 
>>>>>>>>>>>>>>>>>>> agree, we can
>>>>>>>>>>>>>>>>>>> continue the discussion on the PR, else, we can create a 
>>>>>>>>>>>>>>>>>>> doc.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, May 9, 2024 at 4:39 PM Szehon Ho <
>>>>>>>>>>>>>>>>>>> szehon.apa...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks Walaa for driving it forward, looking forward to
>>>>>>>>>>>>>>>>>>>> thinking about implementation of Materialized Views.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I see Jan's point, the PR spec change is similar but
>>>>>>>>>>>>>>>>>>>> does not seem to be completely aligned with the Draft Spec 
>>>>>>>>>>>>>>>>>>>> in the design
>>>>>>>>>>>>>>>>>>>> doc:
>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/
>>>>>>>>>>>>>>>>>>>> .  I left my comments on PR of those sections with the 
>>>>>>>>>>>>>>>>>>>> links to the
>>>>>>>>>>>>>>>>>>>> difference.  I think most of those Draft Spec proposal is 
>>>>>>>>>>>>>>>>>>>> still applicable
>>>>>>>>>>>>>>>>>>>> after the decision to have separate Table and View objects 
>>>>>>>>>>>>>>>>>>>>  It will be
>>>>>>>>>>>>>>>>>>>> interesting to at least see drill a bit further why we did 
>>>>>>>>>>>>>>>>>>>> not choose the
>>>>>>>>>>>>>>>>>>>> approach in the Draft Spec and chose another way.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>> Szehon
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Wed, May 8, 2024 at 4:48 AM Jan Kaul
>>>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid>
>>>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Well, everybody that actively contributed to the
>>>>>>>>>>>>>>>>>>>>> discussion on the original google doc was in consensus. 
>>>>>>>>>>>>>>>>>>>>> That's why I
>>>>>>>>>>>>>>>>>>>>> brought up the topic at the Community Sync on the 
>>>>>>>>>>>>>>>>>>>>> 2024-02-14 (
>>>>>>>>>>>>>>>>>>>>> https://youtu.be/uAQVGd5zV4I?t=890) to raise the
>>>>>>>>>>>>>>>>>>>>> awareness of the broader community. After which the 
>>>>>>>>>>>>>>>>>>>>> discussion about the
>>>>>>>>>>>>>>>>>>>>> storage model started. I don't think that the discussion 
>>>>>>>>>>>>>>>>>>>>> about a single
>>>>>>>>>>>>>>>>>>>>> aspect of a proposal should invalidate all other aspects 
>>>>>>>>>>>>>>>>>>>>> of the proposal.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Regardless, the state of the proposal from the
>>>>>>>>>>>>>>>>>>>>> original google doc contains a lot of valuable 
>>>>>>>>>>>>>>>>>>>>> contributions from Micah,
>>>>>>>>>>>>>>>>>>>>> Szehon, Jack, Dan, yourself and others and it should at 
>>>>>>>>>>>>>>>>>>>>> least provide the
>>>>>>>>>>>>>>>>>>>>> basis for any further discussion. I don't think it's 
>>>>>>>>>>>>>>>>>>>>> effective to start
>>>>>>>>>>>>>>>>>>>>> with a completely different design because we are bound 
>>>>>>>>>>>>>>>>>>>>> to have the same
>>>>>>>>>>>>>>>>>>>>> discussions all over again.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks, Jan
>>>>>>>>>>>>>>>>>>>>> On 08.05.24 12:11, Walaa Eldin Moustafa wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The only consensus the community had was on the object
>>>>>>>>>>>>>>>>>>>>> model through the most recent voting thread [1]. This 
>>>>>>>>>>>>>>>>>>>>> kind of consensus was
>>>>>>>>>>>>>>>>>>>>> not present during the doc discussions, and this should 
>>>>>>>>>>>>>>>>>>>>> be evident from the
>>>>>>>>>>>>>>>>>>>>> fact the last doc state listed 5 alternatives with no 
>>>>>>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>>>>>>> conclusion. I am not quite sure what type of consensus we 
>>>>>>>>>>>>>>>>>>>>> are referring to
>>>>>>>>>>>>>>>>>>>>> here given all the follow up discussions, alternatives, 
>>>>>>>>>>>>>>>>>>>>> etc.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Due to the separate object model, the PR is
>>>>>>>>>>>>>>>>>>>>> fundamentally different from the doc in the sense it does 
>>>>>>>>>>>>>>>>>>>>> not propose a new
>>>>>>>>>>>>>>>>>>>>> metadata model but rather formalizes some new table and 
>>>>>>>>>>>>>>>>>>>>> view properties
>>>>>>>>>>>>>>>>>>>>> related to MVs. That is also one reason there are no 
>>>>>>>>>>>>>>>>>>>>> repeated discussions.
>>>>>>>>>>>>>>>>>>>>> That said, if you feel there is a repeated discussion 
>>>>>>>>>>>>>>>>>>>>> (which I do not see
>>>>>>>>>>>>>>>>>>>>> so far), it would be best to link the relevant discussion 
>>>>>>>>>>>>>>>>>>>>> from the doc in a
>>>>>>>>>>>>>>>>>>>>> comment.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Happy to move the discussion elsewhere if there is
>>>>>>>>>>>>>>>>>>>>> sufficient support for this idea, but as things stand, I 
>>>>>>>>>>>>>>>>>>>>> do not see this as
>>>>>>>>>>>>>>>>>>>>> an efficient way to make progress. It sounds we have been 
>>>>>>>>>>>>>>>>>>>>> re-emphasizing
>>>>>>>>>>>>>>>>>>>>> the same points in the last two replies, so I will let 
>>>>>>>>>>>>>>>>>>>>> others chime in at
>>>>>>>>>>>>>>>>>>>>> this point.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>> https://lists.apache.org/thread/rotmqzmwk5jrcsyxhzjhrvcjs5v3yjcc
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, May 8, 2024 at 2:31 AM Jan Kaul
>>>>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid>
>>>>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> The original google doc
>>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?usp=sharing>
>>>>>>>>>>>>>>>>>>>>>> discussed multiple
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>

Re: Materialized Views: Next Steps

Reply via email to