I think we want to get clarity on the "combined object" approach. Some
discussions are still going on. There is one particular thread
<https://docs.google.com/document/d/1zg0wQ5bVKTckf7-K_cdwF4mlRi6sixLcyEh6jErpGYY/edit?pli=1&disco=AAABIuZ8F3I>
that would benefit from some more clarification. Would be great to be on
the same page there.

I would suggest sticking to the doc because the next thread will be the 3rd
thread on this topic and typically it is very hard to get focused
discussions in email threads. If doc does not work, we can meet.

Thanks,
Walaa.


On Wed, Apr 3, 2024 at 1:03 PM Ryan Blue <b...@tabular.io> wrote:

> If there is consensus, great. We don't usually have a vote when there is
> already consensus. That said, I haven't really seen a confirmation that we
> have consensus, like a thread where people that originally had different
> perspectives all said they favored the same option.
>
> It can help to build clarity by starting a new thread (this one is 70+
> messages) with a clear summary (_not_ a doc) of the direction and ask
> people to speak up if they do or don't agree.
>
> Ryan
>
> On Wed, Apr 3, 2024 at 1:33 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
>> I thought we have a consensus in the doc at least on the possible
>> option. I understood the vote was to adopt one of the options (that is
>> possible for a vote).
>>
>> If we still need more discussion on the possible options or having a
>> consensus on a specific option, it makes sense to continue the
>> discussion on the doc as soon as we are not "blocked" :)
>>
>> Regards
>> JB
>>
>> On Tue, Apr 2, 2024 at 9:12 PM Daniel Weeks <daniel.c.we...@gmail.com>
>> wrote:
>> >
>> > I don't think we're in a position to open a vote (or maybe there's a
>> misunderstanding of what the vote is set out to achieve).
>> >
>> > We need to continue the discussion until there is a general consensus
>> on the direction we want to go (not on what options are available).
>> >
>> > The vote is a confirmation of the direction, not a way to settle
>> disagreements about approaches.
>> >
>> > I think we need to have a more focused discussion (this can either be
>> at a sync or we can schedule a time).
>> >
>> > -Dan
>> >
>> >
>> >
>> > On Mon, Apr 1, 2024 at 10:45 PM Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>> >>
>> >> Hi Walaa
>> >>
>> >> Yes, I think it makes sense to go with a vote, now that pros/cons are
>> >> clearly state in the doc.
>> >>
>> >> Thanks !
>> >> Regards
>> >> JB
>> >>
>> >> On Tue, Apr 2, 2024 at 3:59 AM Walaa Eldin Moustafa
>> >> <wa.moust...@gmail.com> wrote:
>> >> >
>> >> > Hi all, there has not been new activity on the doc for some time.
>> Should we consider voting?
>> >> >
>> >> > On Thu, Mar 28, 2024 at 6:59 AM Jean-Baptiste Onofré <
>> j...@nanthrax.net> wrote:
>> >> >>
>> >> >> Yes, correct, thanks Manu for pointing it out.
>> >> >>
>> >> >> Thanks !
>> >> >> Regards
>> >> >> JB
>> >> >>
>> >> >> On Thu, Mar 28, 2024 at 9:55 AM Manu Zhang <owenzhang1...@gmail.com>
>> wrote:
>> >> >> >
>> >> >> > I think Jan already created it
>> >> >> > https://github.com/apache/iceberg/issues/10043
>> >> >> >
>> >> >> > Jean-Baptiste Onofré <j...@nanthrax.net>于2024年3月28日 周四16:46写道:
>> >> >> >>
>> >> >> >> Hi Walaa,
>> >> >> >>
>> >> >> >> Yes, I think it would be great to create the GH Issue with the
>> >> >> >> proposal template, it would allow us to track the proposal and
>> link
>> >> >> >> the doc (the comments should go in the doc directly).
>> >> >> >> Please, let me know if I can help on that.
>> >> >> >>
>> >> >> >> I'm working on a PR to list the proposals on the website and the
>> >> >> >> "stale reminder".
>> >> >> >>
>> >> >> >> Thanks !
>> >> >> >> Regards
>> >> >> >> JB
>> >> >> >>
>> >> >> >> On Thu, Mar 28, 2024 at 6:52 AM Walaa Eldin Moustafa
>> >> >> >> <wa.moust...@gmail.com> wrote:
>> >> >> >> >
>> >> >> >> > Do we need to create a proposal issue specifically to track
>> this doc?
>> >> >> >> >
>> >> >> >> > Also, everyone, since there has been some updates, would be
>> good to chime in again to discuss the updates. (doc link here for
>> convenience).
>> >> >> >> >
>> >> >> >> > Thanks,
>> >> >> >> > Walaa.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Tue, Mar 26, 2024 at 11:37 PM Jean-Baptiste Onofré <
>> j...@nanthrax.net> wrote:
>> >> >> >> >>
>> >> >> >> >> It sounds good. I would also propose to use the "proposal
>> process":
>> >> >> >> >> creating a github issue with the "proposal" tag and link the
>> document
>> >> >> >> >> there in a comment.
>> >> >> >> >>
>> >> >> >> >> Regards
>> >> >> >> >> JB
>> >> >> >> >>
>> >> >> >> >> On Tue, Mar 26, 2024 at 3:05 PM Walaa Eldin Moustafa
>> >> >> >> >> <wa.moust...@gmail.com> wrote:
>> >> >> >> >> >
>> >> >> >> >> > Thanks Jan! To avoid spreading discussions on multiple
>> places, I will continue the comments on the doc. Also it is easier to run
>> into communication gaps in email threads since effectively we have one
>> thread, but in docs we have many.
>> >> >> >> >> >
>> >> >> >> >> > Thanks,
>> >> >> >> >> > Walaa.
>> >> >> >> >> >
>> >> >> >> >> > On Tue, Mar 26, 2024 at 6:27 AM Jan Kaul
>> <jank...@mailbox.org.invalid> wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> I've added a description to the "Combined metadata" Option
>> of Walaa's document. I'm also adding it here:
>> >> >> >> >> >>
>> >> >> >> >> >> This option treats the underlying view and storage table
>> as a combined catalog object. The operation of this combined approach can
>> be best demonstrated by looking at the different layers of the Iceberg
>> implementation. In the top layer is the Iceberg library that interacts with
>> a particular Iceberg catalog. The catalog handles the access to the
>> metadata storage.
>> >> >> >> >> >> This option uses a combined storage object to store view
>> and table metadata related to the materialized view. To avoid the
>> definition of an entirely new metadata format, the storage object is
>> composed of the view and table metadata. Additionally the combined storage
>> object has a single identifier in the catalogs. The Iceberg library treats
>> the materialized view as a separate view and a storage table object, it is
>> only at the catalog and storage layer that the materialized view is treated
>> as a single entity.
>> >> >> >> >> >> To reuse most of the existing TableCatalog, ViewCatalog
>> and their operations, the table and view catalog can be thought of as
>> “filters” (lenses), that allow the interaction only with the corresponding
>> part of the MV storage object. Performing a “CommitView” operation on the
>> view catalog will only affect the view metadata part of the combined MV
>> storage object. And similarly, performing a “CommitTable” operation on the
>> table catalog will only affect the table metadata part of the combined MV
>> storage object. Both catalogs use the same identifier for operations on the
>> materialized view.
>> >> >> >> >> >> The creation of a materialized view is done with the
>> “createView” operation (with additional materialization flag) on the view
>> catalog, creating a combined MV storage object with an empty storage table.
>> >> >> >> >> >> One could entirely reuse the existing API for loading the
>> materialized view metadata as follows. When calling the “loadView” method
>> of the ViewCatalog, the catalog implementation fetches and caches the
>> entire MV metadata object in process and returns the view metadata part.
>> When the “loadTable” method of the TableCatalog is then called to obtain
>> the storage table, it returns the table part of the cached MV metadata
>> object.
>> >> >> >> >> >>
>> >> >> >> >> >> Best wishes,
>> >> >> >> >> >>
>> >> >> >> >> >> Jan
>> >> >> >> >> >>
>> >> >> >> >> >> On 3/26/24 9:08 AM, Jan Kaul wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> I think it makes sense if I use the "Description" section
>> of your document to clarify how I imagine a combined MV solution to look
>> like. This would simplify the discussion about pros and cons, because we
>> can reference or extend the description. I will try to find the time later
>> today.
>> >> >> >> >> >>
>> >> >> >> >> >> Thanks,
>> >> >> >> >> >>
>> >> >> >> >> >> Jan
>> >> >> >> >> >>
>> >> >> >> >> >> On 3/25/24 4:39 PM, Walaa Eldin Moustafa wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> Thanks Jan! I am not sure if you would like to make
>> suggestions to revise the options themselves or the current options pros
>> and cons. In either case, as mentioned earlier, we can do that on the doc
>> and once we agree on the options and their pros and cons we can move
>> forward. How does that sound?
>> >> >> >> >> >>
>> >> >> >> >> >> Thanks,
>> >> >> >> >> >> Walaa.
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> On Mon, Mar 25, 2024 at 7:45 AM Jan Kaul
>> <jank...@mailbox.org.invalid> wrote:
>> >> >> >> >> >>>
>> >> >> >> >> >>> I have the feeling that the current pros and cons from
>> the summary target a version of the MV spec that wasn't really part of the
>> discussion. The current arguments target a completely new specification for
>> materialized views which we agreed on, is out of scope. Instead of a
>> completely new specification the argument was made for a MV metadata object
>> that embeds the View and the Table metadata, which was Option 6 in Jack's
>> summary document. With that approach the "commitView" and "commitTable"
>> operations don't have to be changed and only the "loadView" operation has
>> to be adopted. Additionally, compaction and snapshot expiration can be
>> reused for the embedded solution. With that in mind, the cons 2, 4, 5, 6
>> from the summary don't really apply.
>> >> >> >> >> >>>
>> >> >> >> >> >>> Furthermore, I think we should distinguish between pros
>> and cons for the implementers and the users. Because most of the pros (no
>> new operations) for separate objects (option1) are for the implementers and
>> most of the pros (single logical object, doesn't require 2 loads) for
>> combined objects (option3) are for the users. In my opinion, in the long
>> run the design decisions should be focused more on the user preferences
>> than the implementers.
>> >> >> >> >> >>> On 3/25/24 14:49, Benny Chow wrote:
>> >> >> >> >> >>>
>> >> >> >> >> >>> Hi Manu
>> >> >> >> >> >>>
>> >> >> >> >> >>> This is Walaa's Spark implementation for option 1:
>> https://github.com/apache/iceberg/pull/9830/files/a9e1bee3b5bf5914e5330d3b195042aea33868c9
>> >> >> >> >> >>> There's no code for option 2 yet.
>> >> >> >> >> >>>
>> >> >> >> >> >>> Best
>> >> >> >> >> >>> Benny
>> >> >> >> >> >>>
>> >> >> >> >> >>> On Mon, Mar 25, 2024 at 12:37 AM Manu Zhang <
>> owenzhang1...@gmail.com> wrote:
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> Thanks Walaa for the summary. It's unclear to me which
>> are the reference implementation for option 1 and reference MV spec for
>> option 2 from the context. I can find some links in the References section
>> but not sure which should be referred to respectively.
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> On Mon, Mar 25, 2024 at 3:38 AM Walaa Eldin Moustafa <
>> wa.moust...@gmail.com> wrote:
>> >> >> >> >> >>>>>
>> >> >> >> >> >>>>> Thanks Himadri for the questions. At this point, our
>> objective is to have a common understanding of both options and their pros
>> and cons. The best way to achieve this is to iterate on the doc to discuss
>> the details of each option or their pros and cons. We can always add more
>> details or update the pros and cons. The main thing is to keep the options
>> to two so that we keep the scope manageable.
>> >> >> >> >> >>>>>
>> >> >> >> >> >>>>> Once we have a common understanding, it will be easy to
>> make a choice and move forward. Therefore, I would suggest reframing your
>> questions as either adding suggestions to add more details to the options,
>> questions on how either works, or discussions of their pros and cons on the
>> doc.
>> >> >> >> >> >>>>>
>> >> >> >> >> >>>>> Thanks,
>> >> >> >> >> >>>>> Walaa.
>> >> >> >> >> >>>>>
>>
>
>
> --
> Ryan Blue
> Tabular
>

Reply via email to