It sounds good. I would also propose to use the "proposal process":
creating a github issue with the "proposal" tag and link the document
there in a comment.

Regards
JB

On Tue, Mar 26, 2024 at 3:05 PM Walaa Eldin Moustafa
<wa.moust...@gmail.com> wrote:
>
> Thanks Jan! To avoid spreading discussions on multiple places, I will 
> continue the comments on the doc. Also it is easier to run into communication 
> gaps in email threads since effectively we have one thread, but in docs we 
> have many.
>
> Thanks,
> Walaa.
>
> On Tue, Mar 26, 2024 at 6:27 AM Jan Kaul <jank...@mailbox.org.invalid> wrote:
>>
>> I've added a description to the "Combined metadata" Option of Walaa's 
>> document. I'm also adding it here:
>>
>> This option treats the underlying view and storage table as a combined 
>> catalog object. The operation of this combined approach can be best 
>> demonstrated by looking at the different layers of the Iceberg 
>> implementation. In the top layer is the Iceberg library that interacts with 
>> a particular Iceberg catalog. The catalog handles the access to the metadata 
>> storage.
>> This option uses a combined storage object to store view and table metadata 
>> related to the materialized view. To avoid the definition of an entirely new 
>> metadata format, the storage object is composed of the view and table 
>> metadata. Additionally the combined storage object has a single identifier 
>> in the catalogs. The Iceberg library treats the materialized view as a 
>> separate view and a storage table object, it is only at the catalog and 
>> storage layer that the materialized view is treated as a single entity.
>> To reuse most of the existing TableCatalog, ViewCatalog and their 
>> operations, the table and view catalog can be thought of as “filters” 
>> (lenses), that allow the interaction only with the corresponding part of the 
>> MV storage object. Performing a “CommitView” operation on the view catalog 
>> will only affect the view metadata part of the combined MV storage object. 
>> And similarly, performing a “CommitTable” operation on the table catalog 
>> will only affect the table metadata part of the combined MV storage object. 
>> Both catalogs use the same identifier for operations on the materialized 
>> view.
>> The creation of a materialized view is done with the “createView” operation 
>> (with additional materialization flag) on the view catalog, creating a 
>> combined MV storage object with an empty storage table.
>> One could entirely reuse the existing API for loading the materialized view 
>> metadata as follows. When calling the “loadView” method of the ViewCatalog, 
>> the catalog implementation fetches and caches the entire MV metadata object 
>> in process and returns the view metadata part. When the “loadTable” method 
>> of the TableCatalog is then called to obtain the storage table, it returns 
>> the table part of the cached MV metadata object.
>>
>> Best wishes,
>>
>> Jan
>>
>> On 3/26/24 9:08 AM, Jan Kaul wrote:
>>
>> I think it makes sense if I use the "Description" section of your document 
>> to clarify how I imagine a combined MV solution to look like. This would 
>> simplify the discussion about pros and cons, because we can reference or 
>> extend the description. I will try to find the time later today.
>>
>> Thanks,
>>
>> Jan
>>
>> On 3/25/24 4:39 PM, Walaa Eldin Moustafa wrote:
>>
>> Thanks Jan! I am not sure if you would like to make suggestions to revise 
>> the options themselves or the current options pros and cons. In either case, 
>> as mentioned earlier, we can do that on the doc and once we agree on the 
>> options and their pros and cons we can move forward. How does that sound?
>>
>> Thanks,
>> Walaa.
>>
>>
>> On Mon, Mar 25, 2024 at 7:45 AM Jan Kaul <jank...@mailbox.org.invalid> wrote:
>>>
>>> I have the feeling that the current pros and cons from the summary target a 
>>> version of the MV spec that wasn't really part of the discussion. The 
>>> current arguments target a completely new specification for materialized 
>>> views which we agreed on, is out of scope. Instead of a completely new 
>>> specification the argument was made for a MV metadata object that embeds 
>>> the View and the Table metadata, which was Option 6 in Jack's summary 
>>> document. With that approach the "commitView" and "commitTable" operations 
>>> don't have to be changed and only the "loadView" operation has to be 
>>> adopted. Additionally, compaction and snapshot expiration can be reused for 
>>> the embedded solution. With that in mind, the cons 2, 4, 5, 6 from the 
>>> summary don't really apply.
>>>
>>> Furthermore, I think we should distinguish between pros and cons for the 
>>> implementers and the users. Because most of the pros (no new operations) 
>>> for separate objects (option1) are for the implementers and most of the 
>>> pros (single logical object, doesn't require 2 loads) for combined objects 
>>> (option3) are for the users. In my opinion, in the long run the design 
>>> decisions should be focused more on the user preferences than the 
>>> implementers.
>>> On 3/25/24 14:49, Benny Chow wrote:
>>>
>>> Hi Manu
>>>
>>> This is Walaa's Spark implementation for option 1:  
>>> https://github.com/apache/iceberg/pull/9830/files/a9e1bee3b5bf5914e5330d3b195042aea33868c9
>>> There's no code for option 2 yet.
>>>
>>> Best
>>> Benny
>>>
>>> On Mon, Mar 25, 2024 at 12:37 AM Manu Zhang <owenzhang1...@gmail.com> wrote:
>>>>
>>>> Thanks Walaa for the summary. It's unclear to me which are the reference 
>>>> implementation for option 1 and reference MV spec for option 2 from the 
>>>> context. I can find some links in the References section but not sure 
>>>> which should be referred to respectively.
>>>>
>>>> On Mon, Mar 25, 2024 at 3:38 AM Walaa Eldin Moustafa 
>>>> <wa.moust...@gmail.com> wrote:
>>>>>
>>>>> Thanks Himadri for the questions. At this point, our objective is to have 
>>>>> a common understanding of both options and their pros and cons. The best 
>>>>> way to achieve this is to iterate on the doc to discuss the details of 
>>>>> each option or their pros and cons. We can always add more details or 
>>>>> update the pros and cons. The main thing is to keep the options to two so 
>>>>> that we keep the scope manageable.
>>>>>
>>>>> Once we have a common understanding, it will be easy to make a choice and 
>>>>> move forward. Therefore, I would suggest reframing your questions as 
>>>>> either adding suggestions to add more details to the options, questions 
>>>>> on how either works, or discussions of their pros and cons on the doc.
>>>>>
>>>>> Thanks,
>>>>> Walaa.
>>>>>

Reply via email to