Kindly remind to review and discuss the proposal in doc. On Thu, Apr 4, 2024 at 9:22 PM Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
> Just to clarify: I think we have a consensus on the two possible > options. So the vote could be helpful to have a consensus about which > option. > > Anyway, we still have discussions going on on this topic :) > > Regards > JB > > On Wed, Apr 3, 2024 at 10:02 PM Ryan Blue <b...@tabular.io> wrote: > > > > If there is consensus, great. We don't usually have a vote when there is > already consensus. That said, I haven't really seen a confirmation that we > have consensus, like a thread where people that originally had different > perspectives all said they favored the same option. > > > > It can help to build clarity by starting a new thread (this one is 70+ > messages) with a clear summary (_not_ a doc) of the direction and ask > people to speak up if they do or don't agree. > > > > Ryan > > > > On Wed, Apr 3, 2024 at 1:33 AM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> > >> I thought we have a consensus in the doc at least on the possible > >> option. I understood the vote was to adopt one of the options (that is > >> possible for a vote). > >> > >> If we still need more discussion on the possible options or having a > >> consensus on a specific option, it makes sense to continue the > >> discussion on the doc as soon as we are not "blocked" :) > >> > >> Regards > >> JB > >> > >> On Tue, Apr 2, 2024 at 9:12 PM Daniel Weeks <daniel.c.we...@gmail.com> > wrote: > >> > > >> > I don't think we're in a position to open a vote (or maybe there's a > misunderstanding of what the vote is set out to achieve). > >> > > >> > We need to continue the discussion until there is a general consensus > on the direction we want to go (not on what options are available). > >> > > >> > The vote is a confirmation of the direction, not a way to settle > disagreements about approaches. > >> > > >> > I think we need to have a more focused discussion (this can either be > at a sync or we can schedule a time). > >> > > >> > -Dan > >> > > >> > > >> > > >> > On Mon, Apr 1, 2024 at 10:45 PM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> >> > >> >> Hi Walaa > >> >> > >> >> Yes, I think it makes sense to go with a vote, now that pros/cons are > >> >> clearly state in the doc. > >> >> > >> >> Thanks ! > >> >> Regards > >> >> JB > >> >> > >> >> On Tue, Apr 2, 2024 at 3:59 AM Walaa Eldin Moustafa > >> >> <wa.moust...@gmail.com> wrote: > >> >> > > >> >> > Hi all, there has not been new activity on the doc for some time. > Should we consider voting? > >> >> > > >> >> > On Thu, Mar 28, 2024 at 6:59 AM Jean-Baptiste Onofré < > j...@nanthrax.net> wrote: > >> >> >> > >> >> >> Yes, correct, thanks Manu for pointing it out. > >> >> >> > >> >> >> Thanks ! > >> >> >> Regards > >> >> >> JB > >> >> >> > >> >> >> On Thu, Mar 28, 2024 at 9:55 AM Manu Zhang < > owenzhang1...@gmail.com> wrote: > >> >> >> > > >> >> >> > I think Jan already created it > >> >> >> > https://github.com/apache/iceberg/issues/10043 > >> >> >> > > >> >> >> > Jean-Baptiste Onofré <j...@nanthrax.net>于2024年3月28日 周四16:46写道: > >> >> >> >> > >> >> >> >> Hi Walaa, > >> >> >> >> > >> >> >> >> Yes, I think it would be great to create the GH Issue with the > >> >> >> >> proposal template, it would allow us to track the proposal and > link > >> >> >> >> the doc (the comments should go in the doc directly). > >> >> >> >> Please, let me know if I can help on that. > >> >> >> >> > >> >> >> >> I'm working on a PR to list the proposals on the website and > the > >> >> >> >> "stale reminder". > >> >> >> >> > >> >> >> >> Thanks ! > >> >> >> >> Regards > >> >> >> >> JB > >> >> >> >> > >> >> >> >> On Thu, Mar 28, 2024 at 6:52 AM Walaa Eldin Moustafa > >> >> >> >> <wa.moust...@gmail.com> wrote: > >> >> >> >> > > >> >> >> >> > Do we need to create a proposal issue specifically to track > this doc? > >> >> >> >> > > >> >> >> >> > Also, everyone, since there has been some updates, would be > good to chime in again to discuss the updates. (doc link here for > convenience). > >> >> >> >> > > >> >> >> >> > Thanks, > >> >> >> >> > Walaa. > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > On Tue, Mar 26, 2024 at 11:37 PM Jean-Baptiste Onofré < > j...@nanthrax.net> wrote: > >> >> >> >> >> > >> >> >> >> >> It sounds good. I would also propose to use the "proposal > process": > >> >> >> >> >> creating a github issue with the "proposal" tag and link > the document > >> >> >> >> >> there in a comment. > >> >> >> >> >> > >> >> >> >> >> Regards > >> >> >> >> >> JB > >> >> >> >> >> > >> >> >> >> >> On Tue, Mar 26, 2024 at 3:05 PM Walaa Eldin Moustafa > >> >> >> >> >> <wa.moust...@gmail.com> wrote: > >> >> >> >> >> > > >> >> >> >> >> > Thanks Jan! To avoid spreading discussions on multiple > places, I will continue the comments on the doc. Also it is easier to run > into communication gaps in email threads since effectively we have one > thread, but in docs we have many. > >> >> >> >> >> > > >> >> >> >> >> > Thanks, > >> >> >> >> >> > Walaa. > >> >> >> >> >> > > >> >> >> >> >> > On Tue, Mar 26, 2024 at 6:27 AM Jan Kaul > <jank...@mailbox.org.invalid> wrote: > >> >> >> >> >> >> > >> >> >> >> >> >> I've added a description to the "Combined metadata" > Option of Walaa's document. I'm also adding it here: > >> >> >> >> >> >> > >> >> >> >> >> >> This option treats the underlying view and storage table > as a combined catalog object. The operation of this combined approach can > be best demonstrated by looking at the different layers of the Iceberg > implementation. In the top layer is the Iceberg library that interacts with > a particular Iceberg catalog. The catalog handles the access to the > metadata storage. > >> >> >> >> >> >> This option uses a combined storage object to store view > and table metadata related to the materialized view. To avoid the > definition of an entirely new metadata format, the storage object is > composed of the view and table metadata. Additionally the combined storage > object has a single identifier in the catalogs. The Iceberg library treats > the materialized view as a separate view and a storage table object, it is > only at the catalog and storage layer that the materialized view is treated > as a single entity. > >> >> >> >> >> >> To reuse most of the existing TableCatalog, ViewCatalog > and their operations, the table and view catalog can be thought of as > “filters” (lenses), that allow the interaction only with the corresponding > part of the MV storage object. Performing a “CommitView” operation on the > view catalog will only affect the view metadata part of the combined MV > storage object. And similarly, performing a “CommitTable” operation on the > table catalog will only affect the table metadata part of the combined MV > storage object. Both catalogs use the same identifier for operations on the > materialized view. > >> >> >> >> >> >> The creation of a materialized view is done with the > “createView” operation (with additional materialization flag) on the view > catalog, creating a combined MV storage object with an empty storage table. > >> >> >> >> >> >> One could entirely reuse the existing API for loading > the materialized view metadata as follows. When calling the “loadView” > method of the ViewCatalog, the catalog implementation fetches and caches > the entire MV metadata object in process and returns the view metadata > part. When the “loadTable” method of the TableCatalog is then called to > obtain the storage table, it returns the table part of the cached MV > metadata object. > >> >> >> >> >> >> > >> >> >> >> >> >> Best wishes, > >> >> >> >> >> >> > >> >> >> >> >> >> Jan > >> >> >> >> >> >> > >> >> >> >> >> >> On 3/26/24 9:08 AM, Jan Kaul wrote: > >> >> >> >> >> >> > >> >> >> >> >> >> I think it makes sense if I use the "Description" > section of your document to clarify how I imagine a combined MV solution to > look like. This would simplify the discussion about pros and cons, because > we can reference or extend the description. I will try to find the time > later today. > >> >> >> >> >> >> > >> >> >> >> >> >> Thanks, > >> >> >> >> >> >> > >> >> >> >> >> >> Jan > >> >> >> >> >> >> > >> >> >> >> >> >> On 3/25/24 4:39 PM, Walaa Eldin Moustafa wrote: > >> >> >> >> >> >> > >> >> >> >> >> >> Thanks Jan! I am not sure if you would like to make > suggestions to revise the options themselves or the current options pros > and cons. In either case, as mentioned earlier, we can do that on the doc > and once we agree on the options and their pros and cons we can move > forward. How does that sound? > >> >> >> >> >> >> > >> >> >> >> >> >> Thanks, > >> >> >> >> >> >> Walaa. > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> On Mon, Mar 25, 2024 at 7:45 AM Jan Kaul > <jank...@mailbox.org.invalid> wrote: > >> >> >> >> >> >>> > >> >> >> >> >> >>> I have the feeling that the current pros and cons from > the summary target a version of the MV spec that wasn't really part of the > discussion. The current arguments target a completely new specification for > materialized views which we agreed on, is out of scope. Instead of a > completely new specification the argument was made for a MV metadata object > that embeds the View and the Table metadata, which was Option 6 in Jack's > summary document. With that approach the "commitView" and "commitTable" > operations don't have to be changed and only the "loadView" operation has > to be adopted. Additionally, compaction and snapshot expiration can be > reused for the embedded solution. With that in mind, the cons 2, 4, 5, 6 > from the summary don't really apply. > >> >> >> >> >> >>> > >> >> >> >> >> >>> Furthermore, I think we should distinguish between pros > and cons for the implementers and the users. Because most of the pros (no > new operations) for separate objects (option1) are for the implementers and > most of the pros (single logical object, doesn't require 2 loads) for > combined objects (option3) are for the users. In my opinion, in the long > run the design decisions should be focused more on the user preferences > than the implementers. > >> >> >> >> >> >>> On 3/25/24 14:49, Benny Chow wrote: > >> >> >> >> >> >>> > >> >> >> >> >> >>> Hi Manu > >> >> >> >> >> >>> > >> >> >> >> >> >>> This is Walaa's Spark implementation for option 1: > https://github.com/apache/iceberg/pull/9830/files/a9e1bee3b5bf5914e5330d3b195042aea33868c9 > >> >> >> >> >> >>> There's no code for option 2 yet. > >> >> >> >> >> >>> > >> >> >> >> >> >>> Best > >> >> >> >> >> >>> Benny > >> >> >> >> >> >>> > >> >> >> >> >> >>> On Mon, Mar 25, 2024 at 12:37 AM Manu Zhang < > owenzhang1...@gmail.com> wrote: > >> >> >> >> >> >>>> > >> >> >> >> >> >>>> Thanks Walaa for the summary. It's unclear to me which > are the reference implementation for option 1 and reference MV spec for > option 2 from the context. I can find some links in the References section > but not sure which should be referred to respectively. > >> >> >> >> >> >>>> > >> >> >> >> >> >>>> On Mon, Mar 25, 2024 at 3:38 AM Walaa Eldin Moustafa < > wa.moust...@gmail.com> wrote: > >> >> >> >> >> >>>>> > >> >> >> >> >> >>>>> Thanks Himadri for the questions. At this point, our > objective is to have a common understanding of both options and their pros > and cons. The best way to achieve this is to iterate on the doc to discuss > the details of each option or their pros and cons. We can always add more > details or update the pros and cons. The main thing is to keep the options > to two so that we keep the scope manageable. > >> >> >> >> >> >>>>> > >> >> >> >> >> >>>>> Once we have a common understanding, it will be easy > to make a choice and move forward. Therefore, I would suggest reframing > your questions as either adding suggestions to add more details to the > options, questions on how either works, or discussions of their pros and > cons on the doc. > >> >> >> >> >> >>>>> > >> >> >> >> >> >>>>> Thanks, > >> >> >> >> >> >>>>> Walaa. > >> >> >> >> >> >>>>> > > > > > > > > -- > > Ryan Blue > > Tabular >