I think Jan already created it https://github.com/apache/iceberg/issues/10043
Jean-Baptiste Onofré <j...@nanthrax.net>于2024年3月28日 周四16:46写道: > Hi Walaa, > > Yes, I think it would be great to create the GH Issue with the > proposal template, it would allow us to track the proposal and link > the doc (the comments should go in the doc directly). > Please, let me know if I can help on that. > > I'm working on a PR to list the proposals on the website and the > "stale reminder". > > Thanks ! > Regards > JB > > On Thu, Mar 28, 2024 at 6:52 AM Walaa Eldin Moustafa > <wa.moust...@gmail.com> wrote: > > > > Do we need to create a proposal issue specifically to track this doc? > > > > Also, everyone, since there has been some updates, would be good to > chime in again to discuss the updates. (doc link here for convenience). > > > > Thanks, > > Walaa. > > > > > > On Tue, Mar 26, 2024 at 11:37 PM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> > >> It sounds good. I would also propose to use the "proposal process": > >> creating a github issue with the "proposal" tag and link the document > >> there in a comment. > >> > >> Regards > >> JB > >> > >> On Tue, Mar 26, 2024 at 3:05 PM Walaa Eldin Moustafa > >> <wa.moust...@gmail.com> wrote: > >> > > >> > Thanks Jan! To avoid spreading discussions on multiple places, I will > continue the comments on the doc. Also it is easier to run into > communication gaps in email threads since effectively we have one thread, > but in docs we have many. > >> > > >> > Thanks, > >> > Walaa. > >> > > >> > On Tue, Mar 26, 2024 at 6:27 AM Jan Kaul <jank...@mailbox.org.invalid> > wrote: > >> >> > >> >> I've added a description to the "Combined metadata" Option of > Walaa's document. I'm also adding it here: > >> >> > >> >> This option treats the underlying view and storage table as a > combined catalog object. The operation of this combined approach can be > best demonstrated by looking at the different layers of the Iceberg > implementation. In the top layer is the Iceberg library that interacts with > a particular Iceberg catalog. The catalog handles the access to the > metadata storage. > >> >> This option uses a combined storage object to store view and table > metadata related to the materialized view. To avoid the definition of an > entirely new metadata format, the storage object is composed of the view > and table metadata. Additionally the combined storage object has a single > identifier in the catalogs. The Iceberg library treats the materialized > view as a separate view and a storage table object, it is only at the > catalog and storage layer that the materialized view is treated as a single > entity. > >> >> To reuse most of the existing TableCatalog, ViewCatalog and their > operations, the table and view catalog can be thought of as “filters” > (lenses), that allow the interaction only with the corresponding part of > the MV storage object. Performing a “CommitView” operation on the view > catalog will only affect the view metadata part of the combined MV storage > object. And similarly, performing a “CommitTable” operation on the table > catalog will only affect the table metadata part of the combined MV storage > object. Both catalogs use the same identifier for operations on the > materialized view. > >> >> The creation of a materialized view is done with the “createView” > operation (with additional materialization flag) on the view catalog, > creating a combined MV storage object with an empty storage table. > >> >> One could entirely reuse the existing API for loading the > materialized view metadata as follows. When calling the “loadView” method > of the ViewCatalog, the catalog implementation fetches and caches the > entire MV metadata object in process and returns the view metadata part. > When the “loadTable” method of the TableCatalog is then called to obtain > the storage table, it returns the table part of the cached MV metadata > object. > >> >> > >> >> Best wishes, > >> >> > >> >> Jan > >> >> > >> >> On 3/26/24 9:08 AM, Jan Kaul wrote: > >> >> > >> >> I think it makes sense if I use the "Description" section of your > document to clarify how I imagine a combined MV solution to look like. This > would simplify the discussion about pros and cons, because we can reference > or extend the description. I will try to find the time later today. > >> >> > >> >> Thanks, > >> >> > >> >> Jan > >> >> > >> >> On 3/25/24 4:39 PM, Walaa Eldin Moustafa wrote: > >> >> > >> >> Thanks Jan! I am not sure if you would like to make suggestions to > revise the options themselves or the current options pros and cons. In > either case, as mentioned earlier, we can do that on the doc and once we > agree on the options and their pros and cons we can move forward. How does > that sound? > >> >> > >> >> Thanks, > >> >> Walaa. > >> >> > >> >> > >> >> On Mon, Mar 25, 2024 at 7:45 AM Jan Kaul <jank...@mailbox.org.invalid> > wrote: > >> >>> > >> >>> I have the feeling that the current pros and cons from the summary > target a version of the MV spec that wasn't really part of the discussion. > The current arguments target a completely new specification for > materialized views which we agreed on, is out of scope. Instead of a > completely new specification the argument was made for a MV metadata object > that embeds the View and the Table metadata, which was Option 6 in Jack's > summary document. With that approach the "commitView" and "commitTable" > operations don't have to be changed and only the "loadView" operation has > to be adopted. Additionally, compaction and snapshot expiration can be > reused for the embedded solution. With that in mind, the cons 2, 4, 5, 6 > from the summary don't really apply. > >> >>> > >> >>> Furthermore, I think we should distinguish between pros and cons > for the implementers and the users. Because most of the pros (no new > operations) for separate objects (option1) are for the implementers and > most of the pros (single logical object, doesn't require 2 loads) for > combined objects (option3) are for the users. In my opinion, in the long > run the design decisions should be focused more on the user preferences > than the implementers. > >> >>> On 3/25/24 14:49, Benny Chow wrote: > >> >>> > >> >>> Hi Manu > >> >>> > >> >>> This is Walaa's Spark implementation for option 1: > https://github.com/apache/iceberg/pull/9830/files/a9e1bee3b5bf5914e5330d3b195042aea33868c9 > >> >>> There's no code for option 2 yet. > >> >>> > >> >>> Best > >> >>> Benny > >> >>> > >> >>> On Mon, Mar 25, 2024 at 12:37 AM Manu Zhang < > owenzhang1...@gmail.com> wrote: > >> >>>> > >> >>>> Thanks Walaa for the summary. It's unclear to me which are the > reference implementation for option 1 and reference MV spec for option 2 > from the context. I can find some links in the References section but not > sure which should be referred to respectively. > >> >>>> > >> >>>> On Mon, Mar 25, 2024 at 3:38 AM Walaa Eldin Moustafa < > wa.moust...@gmail.com> wrote: > >> >>>>> > >> >>>>> Thanks Himadri for the questions. At this point, our objective is > to have a common understanding of both options and their pros and cons. The > best way to achieve this is to iterate on the doc to discuss the details of > each option or their pros and cons. We can always add more details or > update the pros and cons. The main thing is to keep the options to two so > that we keep the scope manageable. > >> >>>>> > >> >>>>> Once we have a common understanding, it will be easy to make a > choice and move forward. Therefore, I would suggest reframing your > questions as either adding suggestions to add more details to the options, > questions on how either works, or discussions of their pros and cons on the > doc. > >> >>>>> > >> >>>>> Thanks, > >> >>>>> Walaa. > >> >>>>> >