Yes, correct, thanks Manu for pointing it out.

Thanks !
Regards
JB

On Thu, Mar 28, 2024 at 9:55 AM Manu Zhang <owenzhang1...@gmail.com> wrote:
>
> I think Jan already created it
> https://github.com/apache/iceberg/issues/10043
>
> Jean-Baptiste Onofré <j...@nanthrax.net>于2024年3月28日 周四16:46写道:
>>
>> Hi Walaa,
>>
>> Yes, I think it would be great to create the GH Issue with the
>> proposal template, it would allow us to track the proposal and link
>> the doc (the comments should go in the doc directly).
>> Please, let me know if I can help on that.
>>
>> I'm working on a PR to list the proposals on the website and the
>> "stale reminder".
>>
>> Thanks !
>> Regards
>> JB
>>
>> On Thu, Mar 28, 2024 at 6:52 AM Walaa Eldin Moustafa
>> <wa.moust...@gmail.com> wrote:
>> >
>> > Do we need to create a proposal issue specifically to track this doc?
>> >
>> > Also, everyone, since there has been some updates, would be good to chime 
>> > in again to discuss the updates. (doc link here for convenience).
>> >
>> > Thanks,
>> > Walaa.
>> >
>> >
>> > On Tue, Mar 26, 2024 at 11:37 PM Jean-Baptiste Onofré <j...@nanthrax.net> 
>> > wrote:
>> >>
>> >> It sounds good. I would also propose to use the "proposal process":
>> >> creating a github issue with the "proposal" tag and link the document
>> >> there in a comment.
>> >>
>> >> Regards
>> >> JB
>> >>
>> >> On Tue, Mar 26, 2024 at 3:05 PM Walaa Eldin Moustafa
>> >> <wa.moust...@gmail.com> wrote:
>> >> >
>> >> > Thanks Jan! To avoid spreading discussions on multiple places, I will 
>> >> > continue the comments on the doc. Also it is easier to run into 
>> >> > communication gaps in email threads since effectively we have one 
>> >> > thread, but in docs we have many.
>> >> >
>> >> > Thanks,
>> >> > Walaa.
>> >> >
>> >> > On Tue, Mar 26, 2024 at 6:27 AM Jan Kaul <jank...@mailbox.org.invalid> 
>> >> > wrote:
>> >> >>
>> >> >> I've added a description to the "Combined metadata" Option of Walaa's 
>> >> >> document. I'm also adding it here:
>> >> >>
>> >> >> This option treats the underlying view and storage table as a combined 
>> >> >> catalog object. The operation of this combined approach can be best 
>> >> >> demonstrated by looking at the different layers of the Iceberg 
>> >> >> implementation. In the top layer is the Iceberg library that interacts 
>> >> >> with a particular Iceberg catalog. The catalog handles the access to 
>> >> >> the metadata storage.
>> >> >> This option uses a combined storage object to store view and table 
>> >> >> metadata related to the materialized view. To avoid the definition of 
>> >> >> an entirely new metadata format, the storage object is composed of the 
>> >> >> view and table metadata. Additionally the combined storage object has 
>> >> >> a single identifier in the catalogs. The Iceberg library treats the 
>> >> >> materialized view as a separate view and a storage table object, it is 
>> >> >> only at the catalog and storage layer that the materialized view is 
>> >> >> treated as a single entity.
>> >> >> To reuse most of the existing TableCatalog, ViewCatalog and their 
>> >> >> operations, the table and view catalog can be thought of as “filters” 
>> >> >> (lenses), that allow the interaction only with the corresponding part 
>> >> >> of the MV storage object. Performing a “CommitView” operation on the 
>> >> >> view catalog will only affect the view metadata part of the combined 
>> >> >> MV storage object. And similarly, performing a “CommitTable” operation 
>> >> >> on the table catalog will only affect the table metadata part of the 
>> >> >> combined MV storage object. Both catalogs use the same identifier for 
>> >> >> operations on the materialized view.
>> >> >> The creation of a materialized view is done with the “createView” 
>> >> >> operation (with additional materialization flag) on the view catalog, 
>> >> >> creating a combined MV storage object with an empty storage table.
>> >> >> One could entirely reuse the existing API for loading the materialized 
>> >> >> view metadata as follows. When calling the “loadView” method of the 
>> >> >> ViewCatalog, the catalog implementation fetches and caches the entire 
>> >> >> MV metadata object in process and returns the view metadata part. When 
>> >> >> the “loadTable” method of the TableCatalog is then called to obtain 
>> >> >> the storage table, it returns the table part of the cached MV metadata 
>> >> >> object.
>> >> >>
>> >> >> Best wishes,
>> >> >>
>> >> >> Jan
>> >> >>
>> >> >> On 3/26/24 9:08 AM, Jan Kaul wrote:
>> >> >>
>> >> >> I think it makes sense if I use the "Description" section of your 
>> >> >> document to clarify how I imagine a combined MV solution to look like. 
>> >> >> This would simplify the discussion about pros and cons, because we can 
>> >> >> reference or extend the description. I will try to find the time later 
>> >> >> today.
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >> Jan
>> >> >>
>> >> >> On 3/25/24 4:39 PM, Walaa Eldin Moustafa wrote:
>> >> >>
>> >> >> Thanks Jan! I am not sure if you would like to make suggestions to 
>> >> >> revise the options themselves or the current options pros and cons. In 
>> >> >> either case, as mentioned earlier, we can do that on the doc and once 
>> >> >> we agree on the options and their pros and cons we can move forward. 
>> >> >> How does that sound?
>> >> >>
>> >> >> Thanks,
>> >> >> Walaa.
>> >> >>
>> >> >>
>> >> >> On Mon, Mar 25, 2024 at 7:45 AM Jan Kaul <jank...@mailbox.org.invalid> 
>> >> >> wrote:
>> >> >>>
>> >> >>> I have the feeling that the current pros and cons from the summary 
>> >> >>> target a version of the MV spec that wasn't really part of the 
>> >> >>> discussion. The current arguments target a completely new 
>> >> >>> specification for materialized views which we agreed on, is out of 
>> >> >>> scope. Instead of a completely new specification the argument was 
>> >> >>> made for a MV metadata object that embeds the View and the Table 
>> >> >>> metadata, which was Option 6 in Jack's summary document. With that 
>> >> >>> approach the "commitView" and "commitTable" operations don't have to 
>> >> >>> be changed and only the "loadView" operation has to be adopted. 
>> >> >>> Additionally, compaction and snapshot expiration can be reused for 
>> >> >>> the embedded solution. With that in mind, the cons 2, 4, 5, 6 from 
>> >> >>> the summary don't really apply.
>> >> >>>
>> >> >>> Furthermore, I think we should distinguish between pros and cons for 
>> >> >>> the implementers and the users. Because most of the pros (no new 
>> >> >>> operations) for separate objects (option1) are for the implementers 
>> >> >>> and most of the pros (single logical object, doesn't require 2 loads) 
>> >> >>> for combined objects (option3) are for the users. In my opinion, in 
>> >> >>> the long run the design decisions should be focused more on the user 
>> >> >>> preferences than the implementers.
>> >> >>> On 3/25/24 14:49, Benny Chow wrote:
>> >> >>>
>> >> >>> Hi Manu
>> >> >>>
>> >> >>> This is Walaa's Spark implementation for option 1:  
>> >> >>> https://github.com/apache/iceberg/pull/9830/files/a9e1bee3b5bf5914e5330d3b195042aea33868c9
>> >> >>> There's no code for option 2 yet.
>> >> >>>
>> >> >>> Best
>> >> >>> Benny
>> >> >>>
>> >> >>> On Mon, Mar 25, 2024 at 12:37 AM Manu Zhang <owenzhang1...@gmail.com> 
>> >> >>> wrote:
>> >> >>>>
>> >> >>>> Thanks Walaa for the summary. It's unclear to me which are the 
>> >> >>>> reference implementation for option 1 and reference MV spec for 
>> >> >>>> option 2 from the context. I can find some links in the References 
>> >> >>>> section but not sure which should be referred to respectively.
>> >> >>>>
>> >> >>>> On Mon, Mar 25, 2024 at 3:38 AM Walaa Eldin Moustafa 
>> >> >>>> <wa.moust...@gmail.com> wrote:
>> >> >>>>>
>> >> >>>>> Thanks Himadri for the questions. At this point, our objective is 
>> >> >>>>> to have a common understanding of both options and their pros and 
>> >> >>>>> cons. The best way to achieve this is to iterate on the doc to 
>> >> >>>>> discuss the details of each option or their pros and cons. We can 
>> >> >>>>> always add more details or update the pros and cons. The main thing 
>> >> >>>>> is to keep the options to two so that we keep the scope manageable.
>> >> >>>>>
>> >> >>>>> Once we have a common understanding, it will be easy to make a 
>> >> >>>>> choice and move forward. Therefore, I would suggest reframing your 
>> >> >>>>> questions as either adding suggestions to add more details to the 
>> >> >>>>> options, questions on how either works, or discussions of their 
>> >> >>>>> pros and cons on the doc.
>> >> >>>>>
>> >> >>>>> Thanks,
>> >> >>>>> Walaa.
>> >> >>>>>

Reply via email to