I don't think we're in a position to open a vote (or maybe there's a
misunderstanding of what the vote is set out to achieve).

We need to continue the discussion until there is a general consensus on
the direction we want to go (not on what options are available).

The vote is a confirmation of the direction, not a way to settle
disagreements about approaches.

I think we need to have a more focused discussion (this can either be at a
sync or we can schedule a time).

-Dan



On Mon, Apr 1, 2024 at 10:45 PM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi Walaa
>
> Yes, I think it makes sense to go with a vote, now that pros/cons are
> clearly state in the doc.
>
> Thanks !
> Regards
> JB
>
> On Tue, Apr 2, 2024 at 3:59 AM Walaa Eldin Moustafa
> <wa.moust...@gmail.com> wrote:
> >
> > Hi all, there has not been new activity on the doc for some time. Should
> we consider voting?
> >
> > On Thu, Mar 28, 2024 at 6:59 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
> >>
> >> Yes, correct, thanks Manu for pointing it out.
> >>
> >> Thanks !
> >> Regards
> >> JB
> >>
> >> On Thu, Mar 28, 2024 at 9:55 AM Manu Zhang <owenzhang1...@gmail.com>
> wrote:
> >> >
> >> > I think Jan already created it
> >> > https://github.com/apache/iceberg/issues/10043
> >> >
> >> > Jean-Baptiste Onofré <j...@nanthrax.net>于2024年3月28日 周四16:46写道:
> >> >>
> >> >> Hi Walaa,
> >> >>
> >> >> Yes, I think it would be great to create the GH Issue with the
> >> >> proposal template, it would allow us to track the proposal and link
> >> >> the doc (the comments should go in the doc directly).
> >> >> Please, let me know if I can help on that.
> >> >>
> >> >> I'm working on a PR to list the proposals on the website and the
> >> >> "stale reminder".
> >> >>
> >> >> Thanks !
> >> >> Regards
> >> >> JB
> >> >>
> >> >> On Thu, Mar 28, 2024 at 6:52 AM Walaa Eldin Moustafa
> >> >> <wa.moust...@gmail.com> wrote:
> >> >> >
> >> >> > Do we need to create a proposal issue specifically to track this
> doc?
> >> >> >
> >> >> > Also, everyone, since there has been some updates, would be good
> to chime in again to discuss the updates. (doc link here for convenience).
> >> >> >
> >> >> > Thanks,
> >> >> > Walaa.
> >> >> >
> >> >> >
> >> >> > On Tue, Mar 26, 2024 at 11:37 PM Jean-Baptiste Onofré <
> j...@nanthrax.net> wrote:
> >> >> >>
> >> >> >> It sounds good. I would also propose to use the "proposal
> process":
> >> >> >> creating a github issue with the "proposal" tag and link the
> document
> >> >> >> there in a comment.
> >> >> >>
> >> >> >> Regards
> >> >> >> JB
> >> >> >>
> >> >> >> On Tue, Mar 26, 2024 at 3:05 PM Walaa Eldin Moustafa
> >> >> >> <wa.moust...@gmail.com> wrote:
> >> >> >> >
> >> >> >> > Thanks Jan! To avoid spreading discussions on multiple places,
> I will continue the comments on the doc. Also it is easier to run into
> communication gaps in email threads since effectively we have one thread,
> but in docs we have many.
> >> >> >> >
> >> >> >> > Thanks,
> >> >> >> > Walaa.
> >> >> >> >
> >> >> >> > On Tue, Mar 26, 2024 at 6:27 AM Jan Kaul
> <jank...@mailbox.org.invalid> wrote:
> >> >> >> >>
> >> >> >> >> I've added a description to the "Combined metadata" Option of
> Walaa's document. I'm also adding it here:
> >> >> >> >>
> >> >> >> >> This option treats the underlying view and storage table as a
> combined catalog object. The operation of this combined approach can be
> best demonstrated by looking at the different layers of the Iceberg
> implementation. In the top layer is the Iceberg library that interacts with
> a particular Iceberg catalog. The catalog handles the access to the
> metadata storage.
> >> >> >> >> This option uses a combined storage object to store view and
> table metadata related to the materialized view. To avoid the definition of
> an entirely new metadata format, the storage object is composed of the view
> and table metadata. Additionally the combined storage object has a single
> identifier in the catalogs. The Iceberg library treats the materialized
> view as a separate view and a storage table object, it is only at the
> catalog and storage layer that the materialized view is treated as a single
> entity.
> >> >> >> >> To reuse most of the existing TableCatalog, ViewCatalog and
> their operations, the table and view catalog can be thought of as “filters”
> (lenses), that allow the interaction only with the corresponding part of
> the MV storage object. Performing a “CommitView” operation on the view
> catalog will only affect the view metadata part of the combined MV storage
> object. And similarly, performing a “CommitTable” operation on the table
> catalog will only affect the table metadata part of the combined MV storage
> object. Both catalogs use the same identifier for operations on the
> materialized view.
> >> >> >> >> The creation of a materialized view is done with the
> “createView” operation (with additional materialization flag) on the view
> catalog, creating a combined MV storage object with an empty storage table.
> >> >> >> >> One could entirely reuse the existing API for loading the
> materialized view metadata as follows. When calling the “loadView” method
> of the ViewCatalog, the catalog implementation fetches and caches the
> entire MV metadata object in process and returns the view metadata part.
> When the “loadTable” method of the TableCatalog is then called to obtain
> the storage table, it returns the table part of the cached MV metadata
> object.
> >> >> >> >>
> >> >> >> >> Best wishes,
> >> >> >> >>
> >> >> >> >> Jan
> >> >> >> >>
> >> >> >> >> On 3/26/24 9:08 AM, Jan Kaul wrote:
> >> >> >> >>
> >> >> >> >> I think it makes sense if I use the "Description" section of
> your document to clarify how I imagine a combined MV solution to look like.
> This would simplify the discussion about pros and cons, because we can
> reference or extend the description. I will try to find the time later
> today.
> >> >> >> >>
> >> >> >> >> Thanks,
> >> >> >> >>
> >> >> >> >> Jan
> >> >> >> >>
> >> >> >> >> On 3/25/24 4:39 PM, Walaa Eldin Moustafa wrote:
> >> >> >> >>
> >> >> >> >> Thanks Jan! I am not sure if you would like to make
> suggestions to revise the options themselves or the current options pros
> and cons. In either case, as mentioned earlier, we can do that on the doc
> and once we agree on the options and their pros and cons we can move
> forward. How does that sound?
> >> >> >> >>
> >> >> >> >> Thanks,
> >> >> >> >> Walaa.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> On Mon, Mar 25, 2024 at 7:45 AM Jan Kaul
> <jank...@mailbox.org.invalid> wrote:
> >> >> >> >>>
> >> >> >> >>> I have the feeling that the current pros and cons from the
> summary target a version of the MV spec that wasn't really part of the
> discussion. The current arguments target a completely new specification for
> materialized views which we agreed on, is out of scope. Instead of a
> completely new specification the argument was made for a MV metadata object
> that embeds the View and the Table metadata, which was Option 6 in Jack's
> summary document. With that approach the "commitView" and "commitTable"
> operations don't have to be changed and only the "loadView" operation has
> to be adopted. Additionally, compaction and snapshot expiration can be
> reused for the embedded solution. With that in mind, the cons 2, 4, 5, 6
> from the summary don't really apply.
> >> >> >> >>>
> >> >> >> >>> Furthermore, I think we should distinguish between pros and
> cons for the implementers and the users. Because most of the pros (no new
> operations) for separate objects (option1) are for the implementers and
> most of the pros (single logical object, doesn't require 2 loads) for
> combined objects (option3) are for the users. In my opinion, in the long
> run the design decisions should be focused more on the user preferences
> than the implementers.
> >> >> >> >>> On 3/25/24 14:49, Benny Chow wrote:
> >> >> >> >>>
> >> >> >> >>> Hi Manu
> >> >> >> >>>
> >> >> >> >>> This is Walaa's Spark implementation for option 1:
> https://github.com/apache/iceberg/pull/9830/files/a9e1bee3b5bf5914e5330d3b195042aea33868c9
> >> >> >> >>> There's no code for option 2 yet.
> >> >> >> >>>
> >> >> >> >>> Best
> >> >> >> >>> Benny
> >> >> >> >>>
> >> >> >> >>> On Mon, Mar 25, 2024 at 12:37 AM Manu Zhang <
> owenzhang1...@gmail.com> wrote:
> >> >> >> >>>>
> >> >> >> >>>> Thanks Walaa for the summary. It's unclear to me which are
> the reference implementation for option 1 and reference MV spec for option
> 2 from the context. I can find some links in the References section but not
> sure which should be referred to respectively.
> >> >> >> >>>>
> >> >> >> >>>> On Mon, Mar 25, 2024 at 3:38 AM Walaa Eldin Moustafa <
> wa.moust...@gmail.com> wrote:
> >> >> >> >>>>>
> >> >> >> >>>>> Thanks Himadri for the questions. At this point, our
> objective is to have a common understanding of both options and their pros
> and cons. The best way to achieve this is to iterate on the doc to discuss
> the details of each option or their pros and cons. We can always add more
> details or update the pros and cons. The main thing is to keep the options
> to two so that we keep the scope manageable.
> >> >> >> >>>>>
> >> >> >> >>>>> Once we have a common understanding, it will be easy to
> make a choice and move forward. Therefore, I would suggest reframing your
> questions as either adding suggestions to add more details to the options,
> questions on how either works, or discussions of their pros and cons on the
> doc.
> >> >> >> >>>>>
> >> >> >> >>>>> Thanks,
> >> >> >> >>>>> Walaa.
> >> >> >> >>>>>
>

Reply via email to