Hi Walaa Yes, I think it makes sense to go with a vote, now that pros/cons are clearly state in the doc.
Thanks ! Regards JB On Tue, Apr 2, 2024 at 3:59 AM Walaa Eldin Moustafa <wa.moust...@gmail.com> wrote: > > Hi all, there has not been new activity on the doc for some time. Should we > consider voting? > > On Thu, Mar 28, 2024 at 6:59 AM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: >> >> Yes, correct, thanks Manu for pointing it out. >> >> Thanks ! >> Regards >> JB >> >> On Thu, Mar 28, 2024 at 9:55 AM Manu Zhang <owenzhang1...@gmail.com> wrote: >> > >> > I think Jan already created it >> > https://github.com/apache/iceberg/issues/10043 >> > >> > Jean-Baptiste Onofré <j...@nanthrax.net>于2024年3月28日 周四16:46写道: >> >> >> >> Hi Walaa, >> >> >> >> Yes, I think it would be great to create the GH Issue with the >> >> proposal template, it would allow us to track the proposal and link >> >> the doc (the comments should go in the doc directly). >> >> Please, let me know if I can help on that. >> >> >> >> I'm working on a PR to list the proposals on the website and the >> >> "stale reminder". >> >> >> >> Thanks ! >> >> Regards >> >> JB >> >> >> >> On Thu, Mar 28, 2024 at 6:52 AM Walaa Eldin Moustafa >> >> <wa.moust...@gmail.com> wrote: >> >> > >> >> > Do we need to create a proposal issue specifically to track this doc? >> >> > >> >> > Also, everyone, since there has been some updates, would be good to >> >> > chime in again to discuss the updates. (doc link here for convenience). >> >> > >> >> > Thanks, >> >> > Walaa. >> >> > >> >> > >> >> > On Tue, Mar 26, 2024 at 11:37 PM Jean-Baptiste Onofré >> >> > <j...@nanthrax.net> wrote: >> >> >> >> >> >> It sounds good. I would also propose to use the "proposal process": >> >> >> creating a github issue with the "proposal" tag and link the document >> >> >> there in a comment. >> >> >> >> >> >> Regards >> >> >> JB >> >> >> >> >> >> On Tue, Mar 26, 2024 at 3:05 PM Walaa Eldin Moustafa >> >> >> <wa.moust...@gmail.com> wrote: >> >> >> > >> >> >> > Thanks Jan! To avoid spreading discussions on multiple places, I >> >> >> > will continue the comments on the doc. Also it is easier to run into >> >> >> > communication gaps in email threads since effectively we have one >> >> >> > thread, but in docs we have many. >> >> >> > >> >> >> > Thanks, >> >> >> > Walaa. >> >> >> > >> >> >> > On Tue, Mar 26, 2024 at 6:27 AM Jan Kaul >> >> >> > <jank...@mailbox.org.invalid> wrote: >> >> >> >> >> >> >> >> I've added a description to the "Combined metadata" Option of >> >> >> >> Walaa's document. I'm also adding it here: >> >> >> >> >> >> >> >> This option treats the underlying view and storage table as a >> >> >> >> combined catalog object. The operation of this combined approach >> >> >> >> can be best demonstrated by looking at the different layers of the >> >> >> >> Iceberg implementation. In the top layer is the Iceberg library >> >> >> >> that interacts with a particular Iceberg catalog. The catalog >> >> >> >> handles the access to the metadata storage. >> >> >> >> This option uses a combined storage object to store view and table >> >> >> >> metadata related to the materialized view. To avoid the definition >> >> >> >> of an entirely new metadata format, the storage object is composed >> >> >> >> of the view and table metadata. Additionally the combined storage >> >> >> >> object has a single identifier in the catalogs. The Iceberg library >> >> >> >> treats the materialized view as a separate view and a storage table >> >> >> >> object, it is only at the catalog and storage layer that the >> >> >> >> materialized view is treated as a single entity. >> >> >> >> To reuse most of the existing TableCatalog, ViewCatalog and their >> >> >> >> operations, the table and view catalog can be thought of as >> >> >> >> “filters” (lenses), that allow the interaction only with the >> >> >> >> corresponding part of the MV storage object. Performing a >> >> >> >> “CommitView” operation on the view catalog will only affect the >> >> >> >> view metadata part of the combined MV storage object. And >> >> >> >> similarly, performing a “CommitTable” operation on the table >> >> >> >> catalog will only affect the table metadata part of the combined MV >> >> >> >> storage object. Both catalogs use the same identifier for >> >> >> >> operations on the materialized view. >> >> >> >> The creation of a materialized view is done with the “createView” >> >> >> >> operation (with additional materialization flag) on the view >> >> >> >> catalog, creating a combined MV storage object with an empty >> >> >> >> storage table. >> >> >> >> One could entirely reuse the existing API for loading the >> >> >> >> materialized view metadata as follows. When calling the “loadView” >> >> >> >> method of the ViewCatalog, the catalog implementation fetches and >> >> >> >> caches the entire MV metadata object in process and returns the >> >> >> >> view metadata part. When the “loadTable” method of the TableCatalog >> >> >> >> is then called to obtain the storage table, it returns the table >> >> >> >> part of the cached MV metadata object. >> >> >> >> >> >> >> >> Best wishes, >> >> >> >> >> >> >> >> Jan >> >> >> >> >> >> >> >> On 3/26/24 9:08 AM, Jan Kaul wrote: >> >> >> >> >> >> >> >> I think it makes sense if I use the "Description" section of your >> >> >> >> document to clarify how I imagine a combined MV solution to look >> >> >> >> like. This would simplify the discussion about pros and cons, >> >> >> >> because we can reference or extend the description. I will try to >> >> >> >> find the time later today. >> >> >> >> >> >> >> >> Thanks, >> >> >> >> >> >> >> >> Jan >> >> >> >> >> >> >> >> On 3/25/24 4:39 PM, Walaa Eldin Moustafa wrote: >> >> >> >> >> >> >> >> Thanks Jan! I am not sure if you would like to make suggestions to >> >> >> >> revise the options themselves or the current options pros and cons. >> >> >> >> In either case, as mentioned earlier, we can do that on the doc and >> >> >> >> once we agree on the options and their pros and cons we can move >> >> >> >> forward. How does that sound? >> >> >> >> >> >> >> >> Thanks, >> >> >> >> Walaa. >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Mar 25, 2024 at 7:45 AM Jan Kaul >> >> >> >> <jank...@mailbox.org.invalid> wrote: >> >> >> >>> >> >> >> >>> I have the feeling that the current pros and cons from the summary >> >> >> >>> target a version of the MV spec that wasn't really part of the >> >> >> >>> discussion. The current arguments target a completely new >> >> >> >>> specification for materialized views which we agreed on, is out of >> >> >> >>> scope. Instead of a completely new specification the argument was >> >> >> >>> made for a MV metadata object that embeds the View and the Table >> >> >> >>> metadata, which was Option 6 in Jack's summary document. With that >> >> >> >>> approach the "commitView" and "commitTable" operations don't have >> >> >> >>> to be changed and only the "loadView" operation has to be adopted. >> >> >> >>> Additionally, compaction and snapshot expiration can be reused for >> >> >> >>> the embedded solution. With that in mind, the cons 2, 4, 5, 6 from >> >> >> >>> the summary don't really apply. >> >> >> >>> >> >> >> >>> Furthermore, I think we should distinguish between pros and cons >> >> >> >>> for the implementers and the users. Because most of the pros (no >> >> >> >>> new operations) for separate objects (option1) are for the >> >> >> >>> implementers and most of the pros (single logical object, doesn't >> >> >> >>> require 2 loads) for combined objects (option3) are for the users. >> >> >> >>> In my opinion, in the long run the design decisions should be >> >> >> >>> focused more on the user preferences than the implementers. >> >> >> >>> On 3/25/24 14:49, Benny Chow wrote: >> >> >> >>> >> >> >> >>> Hi Manu >> >> >> >>> >> >> >> >>> This is Walaa's Spark implementation for option 1: >> >> >> >>> https://github.com/apache/iceberg/pull/9830/files/a9e1bee3b5bf5914e5330d3b195042aea33868c9 >> >> >> >>> There's no code for option 2 yet. >> >> >> >>> >> >> >> >>> Best >> >> >> >>> Benny >> >> >> >>> >> >> >> >>> On Mon, Mar 25, 2024 at 12:37 AM Manu Zhang >> >> >> >>> <owenzhang1...@gmail.com> wrote: >> >> >> >>>> >> >> >> >>>> Thanks Walaa for the summary. It's unclear to me which are the >> >> >> >>>> reference implementation for option 1 and reference MV spec for >> >> >> >>>> option 2 from the context. I can find some links in the >> >> >> >>>> References section but not sure which should be referred to >> >> >> >>>> respectively. >> >> >> >>>> >> >> >> >>>> On Mon, Mar 25, 2024 at 3:38 AM Walaa Eldin Moustafa >> >> >> >>>> <wa.moust...@gmail.com> wrote: >> >> >> >>>>> >> >> >> >>>>> Thanks Himadri for the questions. At this point, our objective >> >> >> >>>>> is to have a common understanding of both options and their pros >> >> >> >>>>> and cons. The best way to achieve this is to iterate on the doc >> >> >> >>>>> to discuss the details of each option or their pros and cons. We >> >> >> >>>>> can always add more details or update the pros and cons. The >> >> >> >>>>> main thing is to keep the options to two so that we keep the >> >> >> >>>>> scope manageable. >> >> >> >>>>> >> >> >> >>>>> Once we have a common understanding, it will be easy to make a >> >> >> >>>>> choice and move forward. Therefore, I would suggest reframing >> >> >> >>>>> your questions as either adding suggestions to add more details >> >> >> >>>>> to the options, questions on how either works, or discussions of >> >> >> >>>>> their pros and cons on the doc. >> >> >> >>>>> >> >> >> >>>>> Thanks, >> >> >> >>>>> Walaa. >> >> >> >>>>>