Hi Szehon, I think choosing separate view + table objects precludes us from adding new metadata to table and view metadata. Here is one relevant comment [1] from Ryan on the modeling doc, where his point is that we want to avoid introducing new APIs since it requires updating every catalog, and (quoting) even now, we have few implementations that support views because of the problems updating back ends. Therefore, one of the major reasons to avoid a new model with new metadata is to avoid adding new metadata, which introduces this complexity. Here is another similar comment from Renjie [2] on the cons listed for the combined object approach.
Even Ryan's point on the MV issue that you referenced reads to me as he is supportive of the property model. Here are some quotes: > We would still want some MV metadata in table *properties*. > I recommend instead reusing the existing snapshot metadata structure to store what you need as snapshot *properties*. > First, I think we want to avoid keeping much state information in complex table *properties*. Again, here, he is supportive of table properties, but wants to make sure that the information is simple. > We may want additional metadata as well, like a UUID to ensure we have the right view. I don't think we have a UUID in the view spec yet, but we could add one. Here, he is very specific when it comes to new metadata fields, and explicitly calls it out. That is the only new metadata field in that reply and by now it is already supported. It is also not MV-specific. Hope this addresses your question on the property vs new metadata model. [1] https://docs.google.com/document/d/1zg0wQ5bVKTckf7-K_cdwF4mlRi6sixLcyEh6jErpGYY/edit?pli=1&disco=AAABK7e3QB4 [2] https://docs.google.com/document/d/1zg0wQ5bVKTckf7-K_cdwF4mlRi6sixLcyEh6jErpGYY/edit?pli=1&disco=AAABIonvCGE Thanks, Walaa. On Thu, May 9, 2024 at 5:49 PM Szehon Ho <szehon.apa...@gmail.com> wrote: > Hi Walaa, > > I agree, I definitely do not want yet another pr/doc where discussion > happens. as its already quite spread out :) But did not want to clarify > some points before we get started on the discussion on your PR. > > With reusing the table and view objects, we are not changing the existing >> metadata of either table or view spec but rather introduce new properties >> and formalize them to express materialized views >> > > On this point, I am not 100% sure that choosing to represent a > MaterializedView as a separate View + Table object precludes us from adding > to metadata of Table or View as the Draft Spec suggested, though. I think > this point was discussed in Jan's initial PR with a good point from Ryan: > https://github.com/apache/iceberg/issues/6420#issuecomment-1369280546 that > using Table Properties to track lineage is fairly brittle, and having it > formalized in the Iceberg metadata is cleaner, and that was thus the > direction of the Draft Spec in the design doc. What do people think? > > Thanks > Szehon > > > > On Thu, May 9, 2024 at 5:35 PM Walaa Eldin Moustafa <wa.moust...@gmail.com> > wrote: > >> Thanks Szehon. >> >> The reason for the difference is that the proposal in the Google doc is >> based on a new MV model, hence, new metadata fields and a new metadata >> model were being introduced (with types, optionality, etc). With reusing >> the table and view objects, we are not changing the existing metadata of >> either table or view spec but rather introduce new properties and formalize >> them to express materialized views. This would be the answer to most of the >> questions you posted on the PR (besides some naming questions, which I >> think should be straightforward). >> >> With that fundamental difference, we cannot lift and shift what is in the >> doc to any PR. Further, having consensus on separate table and view objects >> contradicts with the point being made on having consensus on the doc. We >> might have had agreements on some elements, but definitely not on the whole >> doc, proven by the follow ups (also as a community, not individuals). >> >> Therefore: we need a new space to discuss the separate table and view >> properties. >> >> Is the question whether to: >> 1- Create a new doc >> 2- Create a new PR? >> >> I feel a PR is the most effective way, especially given the fact that we >> discussed the topic a lot by now. If we agree, we can continue the >> discussion on the PR, else, we can create a doc. >> >> Thanks, >> Walaa. >> >> >> On Thu, May 9, 2024 at 4:39 PM Szehon Ho <szehon.apa...@gmail.com> wrote: >> >>> Thanks Walaa for driving it forward, looking forward to thinking about >>> implementation of Materialized Views. >>> >>> I see Jan's point, the PR spec change is similar but does not seem to be >>> completely aligned with the Draft Spec in the design doc: >>> https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/ >>> . I left my comments on PR of those sections with the links to the >>> difference. I think most of those Draft Spec proposal is still applicable >>> after the decision to have separate Table and View objects It will be >>> interesting to at least see drill a bit further why we did not choose the >>> approach in the Draft Spec and chose another way. >>> >>> Thanks >>> Szehon >>> >>> On Wed, May 8, 2024 at 4:48 AM Jan Kaul <jank...@mailbox.org.invalid> >>> wrote: >>> >>>> Well, everybody that actively contributed to the discussion on the >>>> original google doc was in consensus. That's why I brought up the topic at >>>> the Community Sync on the 2024-02-14 ( >>>> https://youtu.be/uAQVGd5zV4I?t=890) to raise the awareness of the >>>> broader community. After which the discussion about the storage model >>>> started. I don't think that the discussion about a single aspect of a >>>> proposal should invalidate all other aspects of the proposal. >>>> >>>> Regardless, the state of the proposal from the original google doc >>>> contains a lot of valuable contributions from Micah, Szehon, Jack, Dan, >>>> yourself and others and it should at least provide the basis for any >>>> further discussion. I don't think it's effective to start with a completely >>>> different design because we are bound to have the same discussions all over >>>> again. >>>> >>>> Thanks, Jan >>>> On 08.05.24 12:11, Walaa Eldin Moustafa wrote: >>>> >>>> The only consensus the community had was on the object model through >>>> the most recent voting thread [1]. This kind of consensus was not present >>>> during the doc discussions, and this should be evident from the fact the >>>> last doc state listed 5 alternatives with no particular conclusion. I am >>>> not quite sure what type of consensus we are referring to here given all >>>> the follow up discussions, alternatives, etc. >>>> >>>> Due to the separate object model, the PR is fundamentally different >>>> from the doc in the sense it does not propose a new metadata model but >>>> rather formalizes some new table and view properties related to MVs. That >>>> is also one reason there are no repeated discussions. That said, if you >>>> feel there is a repeated discussion (which I do not see so far), it would >>>> be best to link the relevant discussion from the doc in a comment. >>>> >>>> Happy to move the discussion elsewhere if there is sufficient support >>>> for this idea, but as things stand, I do not see this as an efficient way >>>> to make progress. It sounds we have been re-emphasizing the same points in >>>> the last two replies, so I will let others chime in at this point. >>>> >>>> [1] https://lists.apache.org/thread/rotmqzmwk5jrcsyxhzjhrvcjs5v3yjcc >>>> >>>> Thanks, >>>> Walaa. >>>> >>>> >>>> On Wed, May 8, 2024 at 2:31 AM Jan Kaul <jank...@mailbox.org.invalid> >>>> <jank...@mailbox.org.invalid> wrote: >>>> >>>>> The original google doc >>>>> <https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?usp=sharing> >>>>> discussed multiple aspects of the Materialized View spec. One was the >>>>> storage model while others were related to the metadata. After we (Micah, >>>>> Szehon, you, me) reached consensus in the google doc, Jack raised his >>>>> concern about the storage model and the long discussion about the storage >>>>> model started. Now we truly reached consensus about the storage model, >>>>> which is now also reflected in the google doc. All other aspects from the >>>>> google doc about the metadata weren't questioned and still represent the >>>>> consensus. >>>>> >>>>> I would like to *avoid repeating the discussions* in your PR that we >>>>> already had in the google doc. Especially since we reached consensus which >>>>> took a considerable amount of time. >>>>> >>>>> Thanks, Jan >>>>> On 08.05.24 10:21, Walaa Eldin Moustafa wrote: >>>>> >>>>> Thanks Jan. I think we moved on to more alignment steps beyond that >>>>> doc a while ago. After that doc, we have discussed the topic further in 2 >>>>> dev list threads and one more doc >>>>> <https://docs.google.com/document/d/1zg0wQ5bVKTckf7-K_cdwF4mlRi6sixLcyEh6jErpGYY/edit?pli=1> >>>>> (with strictly two options for the storage model to consider). Moreover, >>>>> the original doc grew to 14 pages long with one section comparing 5 design >>>>> alternatives, which made things harder to reach consensus. The lack of >>>>> consensus is what partly led up to the subsequent discussions and call for >>>>> a more focused approach to reach consensus. If we already have a consensus >>>>> on the storage model (separate tables and views), I think we should take >>>>> things further and have continued focused discussions on the specific >>>>> metadata in the form of a PR. I have included all previous discussions >>>>> including the original doc and issue as references in the PR description. >>>>> Please let me know if this works. Happy to hear others' thoughts on the >>>>> best way to move forward. >>>>> >>>>> Thanks, >>>>> Walaa. >>>>> >>>>> >>>>> On Wed, May 8, 2024 at 12:56 AM Jan Kaul <jank...@mailbox.org.invalid> >>>>> <jank...@mailbox.org.invalid> wrote: >>>>> >>>>>> Thanks Walaa for trying to move things along. However I don't think >>>>>> it's a good idea to start a separate discussion about the metadata for >>>>>> materialized views because we already had this discussion and reached >>>>>> consensus in this google doc: >>>>>> >>>>>> >>>>>> https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?usp=sharing >>>>>> >>>>>> Once the draft is finalized we can adopt the PR to reflect the >>>>>> consensus from the google doc. >>>>>> >>>>>> Best wishes, >>>>>> >>>>>> Jan >>>>>> On 07.05.24 19:11, Walaa Eldin Moustafa wrote: >>>>>> >>>>>> Thanks Steven. I feel it is needed so the MV spec is not scattered >>>>>> across the table and view spec pages. We may add a reference in each >>>>>> respective properties section. >>>>>> >>>>>> On Tue, May 7, 2024 at 10:04 AM Steven Wu <stevenz...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Walaa, thanks for initiating the next step. >>>>>>> >>>>>>> With the agreed model of separate view and storage table, I am >>>>>>> wondering if a separate materialized view spec page is needed. E.g., the >>>>>>> new view metadata (view-materialized and view-storage-table) is probably >>>>>>> good to be added to the view page directly to avoid information >>>>>>> scattering. >>>>>>> The same can be said about the storage table metadata. >>>>>>> >>>>>>> We may keep the separate materialized view page to document >>>>>>> motivation, freshness semantics, etc.. >>>>>>> >>>>>>> On Mon, May 6, 2024 at 10:58 PM Walaa Eldin Moustafa < >>>>>>> wa.moust...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Everyone, >>>>>>>> >>>>>>>> Thanks again for participating in the modeling discussion [1]. >>>>>>>> Since the outcome of this discussion was to model materialized views as >>>>>>>> separate objects, an Iceberg view and a table, I think the next step >>>>>>>> should be discussing the metadata details for each object. I have >>>>>>>> created a >>>>>>>> PR https://github.com/apache/iceberg/pull/10280 with an initial >>>>>>>> spec improvement. Please feel free to review it and leave feedback >>>>>>>> there. >>>>>>>> >>>>>>>> [1] >>>>>>>> https://lists.apache.org/thread/rotmqzmwk5jrcsyxhzjhrvcjs5v3yjcc >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Walaa. >>>>>>>> >>>>>>>>