Thanks Jack! I feel Question 0 is very broad, essentially capturing the whole design. Can we start by discussing more granular questions?
On Wed, Feb 21, 2024 at 8:53 PM Jack Ye <yezhao...@gmail.com> wrote: > Thanks everyone for the help in organizing the thoughts! > > I have moved the summary of everyone's comments here also to the doc that > Jan linked under question 0. We can continue to have more discussions there > and cast votes! > > Best, > Jack Ye > > On Wed, Feb 21, 2024 at 12:14 PM Jan Kaul <jank...@mailbox.org.invalid> > wrote: > >> Thanks Micah, I think the voting chips are great. >> >> @Szehon, actually what I had in mind was not to have one thread per >> question but rather have smaller threads that can be resolved more easily. >> I have the fear that one thread for the current question would lead to a >> very long and unmanageable discussion. >> >> I've added another row to the table where everyone could provide a >> summary of their reason for choosing a certain design. This way we could >> move some of the content from the comment threads to the main document. >> On 21.02.24 19:58, Micah Kornfield wrote: >> >> Of course we also need threads that express our preferences (voting). I >>> would suggest to keep these separate from discussions about single points >>> so that they can be persisted in the document. >> >> >> Not sure if it helpful, but I added voting chips Question 0, as maybe an >> easier way to keep track of votes. If it is helpful, I can add them in >> other places that still need a vote (I think one needs a paid Google Docs >> account to insert them). >> >> Thanks, >> Micah >> >> On Wed, Feb 21, 2024 at 10:23 AM Szehon Ho <szehon.apa...@gmail.com> >> wrote: >> >>> Thanks Jan. +1 on having just one thread per question for >>> vote/preference. Where do you suggest we have it, on the discussion >>> question itself? It would be to keep the existing threads and move it >>> there. >>> >>> Also, I think it makes sense with making a slack channel (for quick >>> question, reply) , and also discuss unresolved questions in the next week's >>> sync or a separate meeting. >>> >>> On Wed, Feb 21, 2024 at 12:40 AM Jan Kaul <jank...@mailbox.org.invalid> >>> <jank...@mailbox.org.invalid> wrote: >>> >>>> Thank you Jack for driving the consensus for the MV spec and thank you >>>> all for the discussion. >>>> >>>> I really like the idea about incremental consensus because we often >>>> loose sight in detailed discussions. As Jack mentioned, the highest >>>> priority question currently is: >>>> >>>> *Should the Iceberg MV be realized as a view + storage table or do we >>>> define a new metadata format? *To have one place for the discussion, I >>>> created another Question ( >>>> https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?pli=1#heading=h.y70rtfhi9qxi) >>>> to the Materialized View Spec google document. >>>> >>>> To improve the visibility of the arguments I would like to propose a >>>> new process. It would be great if all relevant information is stored in the >>>> document itself. Therefore I would suggest to use the comment threads for >>>> smaller, temporary discussions which can be resolved by adding the points >>>> to the main document. Please close the threads if the information was added >>>> to the document. Additionally, I gave you all permissions to edit the >>>> documents, so you can add missing points yourselves. >>>> >>>> Of course we also need threads that express our preferences (voting). I >>>> would suggest to keep these separate from discussions about single points >>>> so that they can be persisted in the document. >>>> >>>> After a phase of collecting arguments for the different designs I think >>>> it would make sense to have video call to have a face to face discussion. >>>> >>>> What do you think? >>>> >>>> Best wishes, >>>> >>>> Jan >>>> On 20.02.24 21:32, Manish Malhotra wrote: >>>> >>>> Very excited for MV to be in Iceberg :) >>>> Keeping in the same doc. would be helpful, to have the trail. >>>> But also agreed, if there are too many directions/threads, then keep >>>> closing the old one, if there are no more questions. >>>> And put down the assumptions for the initial version to move forward. >>>> >>>> >>>> On Tue, Feb 20, 2024 at 12:17 PM Walaa Eldin Moustafa < >>>> wa.moust...@gmail.com> wrote: >>>> >>>>> I would vote to keep a log in the doc with open questions, and keep >>>>> the doc updated with open questions as they arise/get resolved. >>>>> >>>>> On Tue, Feb 20, 2024 at 11:37 AM Jack Ye <yezhao...@gmail.com> wrote: >>>>> >>>>>> Thanks for the response from everyone! >>>>>> >>>>>> Before proceeding further, I see a few people referring back to the >>>>>> current design from Jan. I specifically raised this thread based on the >>>>>> information in the doc and a few latest discussions we had there. Because >>>>>> there are many threads in the doc, and each thread points further to >>>>>> other >>>>>> discussion threads in the same doc or other doc, it is now quite hard to >>>>>> follow and continue discussing all different topics there. >>>>>> >>>>>> I hope we can make incremental consensus of the questions in the doc >>>>>> through devlist, because it provides more visibility, and also a single >>>>>> thread instead of multiple threads going on at the same time. If we think >>>>>> this format is not effective, I propose that we create a new mv channel >>>>>> in >>>>>> Iceberg Slack workspace, and people interested can join and discuss all >>>>>> these points directly. What do we think? >>>>>> >>>>>> Best, >>>>>> Jack Ye >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Feb 19, 2024 at 6:03 PM Szehon Ho <szehon.apa...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Great to see more discussion on the MV spec. Actually, Jan's >>>>>>> document "Iceberg Materialized View Spec" >>>>>>> <https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A> >>>>>>> has >>>>>>> been organized , with a "Design Questions" section to track these >>>>>>> debates, >>>>>>> and it would be nice to centralize the debates there, as Micah mentions. >>>>>>> >>>>>>> For Dan's question, I think this debate was tracked in "Design Question >>>>>>> 3: Should the storage table be registered in the catalog?". I think the >>>>>>> general idea there was to not expose it directly via Catalog as it is >>>>>>> then >>>>>>> exposed to user modification. If the engine wants to access anything >>>>>>> about >>>>>>> the storage table (including audit and storage), it is of course there >>>>>>> via >>>>>>> the storage table pointer. I think Walaa's point is also good, we could >>>>>>> expose it as we expose metadata tables, but I am still not sure if >>>>>>> there is >>>>>>> still some use-cases of engine access not covered? >>>>>>> >>>>>>> It is true that for Jack's initial question (Do we really want to go >>>>>>> with the MV = view + storage table design approach for Iceberg MV?), >>>>>>> unfortunately we did not capture it as a "Design Question" in Jan's >>>>>>> doc, as >>>>>>> it was an implicit assumption of 'yes', because it is the choice of >>>>>>> Hive, >>>>>>> Trino, and other engines , as others have pointed out. >>>>>>> >>>>>>> Jack's point about potential evolution of MV (like to add >>>>>>> partitioning) is an interesting one, but definitely hard to grasp. I >>>>>>> think >>>>>>> it makes sense to add this as a separate Design Question in the doc, and >>>>>>> add the options. This will allow us to flesh out this alternative >>>>>>> option(s). Maybe Micah's point about modifying existing proposal to >>>>>>> 'embed' the required table metadata fields in the existing view >>>>>>> metadata, >>>>>>> is one middle ground option. Or we add a totally new MV object spec for >>>>>>> MV, separate than existing View spec? >>>>>>> >>>>>>> Also , as Jack pointed out, it may make sense to have the REST / >>>>>>> Catalog API proposal in the doc to educate the above decision. >>>>>>> >>>>>>> Thanks >>>>>>> Szehon >>>>>>> >>>>>>> On Mon, Feb 19, 2024 at 4:08 PM Walaa Eldin Moustafa < >>>>>>> wa.moust...@gmail.com> wrote: >>>>>>> >>>>>>>> I think it would help if we answer the question of whether an MV is >>>>>>>> a view + storage table (and degree of exposing this underlying >>>>>>>> implementation) in the context of the user interfacing with those >>>>>>>> concepts: >>>>>>>> >>>>>>>> For the end user, interfacing with the engine APIs (e.g., through >>>>>>>> SQL), materialized view APIs should be almost the same as regular view >>>>>>>> APIs >>>>>>>> (except for operations specific to materialized views like REFRESH >>>>>>>> command >>>>>>>> etc). Typically, the end user interacts with the (materialized) view >>>>>>>> object >>>>>>>> as a view, and the engine performs the abstraction over the storage >>>>>>>> table. >>>>>>>> >>>>>>>> For the engines interfacing with Iceberg, it sounds the correct >>>>>>>> abstraction at this layer is indeed view + storage table, and engines >>>>>>>> could >>>>>>>> have access to both objects to optimize queries. >>>>>>>> >>>>>>>> So in a sense, the engine will ultimately hide most of the >>>>>>>> storage detail from the end user (except for advanced users who want to >>>>>>>> explicitly access the storage table with a modifier like >>>>>>>> "db.view.storageTable" -- and they can only read it), while Iceberg >>>>>>>> will >>>>>>>> expose the storage details to the engine catalog to use it in scans if >>>>>>>> needed. So the storage table is hidden or exposed based on the >>>>>>>> context/the >>>>>>>> actual users. From Iceberg point of view (which interacts with the >>>>>>>> engines), the storage table is exposed. Note that this does not >>>>>>>> necessarily mean that the storage table is registered in the catalog >>>>>>>> with >>>>>>>> its own independent name (e.g., where we can drop the view but keep the >>>>>>>> storage table and access it from the catalog). Addressing the storage >>>>>>>> table >>>>>>>> using a virtual namespace like "db.view.storageTable" sounds like a >>>>>>>> good >>>>>>>> middle ground. Anyways, end users should not need to directly access >>>>>>>> the >>>>>>>> storage table in most cases. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Walaa. >>>>>>>> >>>>>>>> On Mon, Feb 19, 2024 at 3:38 PM Micah Kornfield < >>>>>>>> emkornfi...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Jack, >>>>>>>>> >>>>>>>>> >>>>>>>>>> In my mind, the first key point we all need to agree upon to move >>>>>>>>>> this design forward is*: Do we really want to go with the MV = >>>>>>>>>> view + storage table design approach for Iceberg MV?* >>>>>>>>> >>>>>>>>> >>>>>>>>> I think we want this to the extent that we do not want to redefine >>>>>>>>> the same concept with different representations/naming to the greatest >>>>>>>>> degree possible. This is why borrowing the concepts from the view >>>>>>>>> (e.g. >>>>>>>>> multiple ways of expressing the same view logic in different >>>>>>>>> dialects) and >>>>>>>>> aspects of the materialized data (e.g. partitioning, ordering) feels >>>>>>>>> most >>>>>>>>> natural. IIUC your proposal, I think you are saying maybe two >>>>>>>>> modifications to the existing proposals in the document: >>>>>>>>> >>>>>>>>> 1. No separate storage table link, instead embed most of the >>>>>>>>> metadata of the materialized table into the MV document (the exception >>>>>>>>> seems to be snapshot history) >>>>>>>>> 2. For snapshot history, have one unified history specific to the >>>>>>>>> MV. >>>>>>>>> >>>>>>>>> This seems fairly reasonable to me and I think I can solve some >>>>>>>>> challenges with the existing proposal in an elegant way. If this is >>>>>>>>> correct (or maybe if it isn't quite correct) perhaps you can make >>>>>>>>> suggestions to the document so all of the trade-offs can be discussed >>>>>>>>> in >>>>>>>>> one place? >>>>>>>>> >>>>>>>>> I think the one thing the current draft of the materialized view >>>>>>>>> ignores is how to store algebraic summaries (e.g. separate sum and >>>>>>>>> count >>>>>>>>> for averages, or other sketches), so that new data can be >>>>>>>>> incrementally >>>>>>>>> incorporated. But representing these structures feels like it >>>>>>>>> potentially >>>>>>>>> has value beyond just MVs (e.g. it can be a natural way to express >>>>>>>>> summary >>>>>>>>> statistics in table metadata), so I think it deserves at least a try >>>>>>>>> in >>>>>>>>> incorporating the concepts in the table specification, so the >>>>>>>>> definitions >>>>>>>>> can be shared. I was imagining this could come as part of the next >>>>>>>>> revision of MV specification. >>>>>>>>> >>>>>>>>> The MV internal structure could evolve in a way that works more >>>>>>>>>> efficiently with the reduced scope of functionalities, without >>>>>>>>>> relying on >>>>>>>>>> table to offer the same capabilities. I can at least say that is >>>>>>>>>> true based >>>>>>>>>> on my internal knowledge of how Redshift MVs work. >>>>>>>>> >>>>>>>>> >>>>>>>>> I'm not sure I fully understand this point, but it seems the main >>>>>>>>> question here is what would break if it started to evolve in this >>>>>>>>> direction. Is it purely additive or do we suspect some elements >>>>>>>>> would need >>>>>>>>> to be removed? My gut feeling here is the main concerns here are >>>>>>>>> getting >>>>>>>>> the cardinatities correct (i.e. 1 MV should probably have 0, 1 or more >>>>>>>>> materialized storage tables associated with it, to support more >>>>>>>>> advanced >>>>>>>>> algebraic structures listed above, and perhaps a second without them, >>>>>>>>> and >>>>>>>>> additional metadata to distinguish between these two different modes). >>>>>>>>> >>>>>>>>> If after the evaluation, we are confident that the MV = view + >>>>>>>>>> storage table approach is the right way to go, then we can debate >>>>>>>>>> the other >>>>>>>>>> issues, and I think the next issue to reach consensus should be >>>>>>>>>> "Should the >>>>>>>>>> storage table be registered in the catalog?". >>>>>>>>> >>>>>>>>> >>>>>>>>> I actually think there are actually more fundamental questions >>>>>>>>> posed: >>>>>>>>> 1. Should be considering how items should be modelled in the REST >>>>>>>>> API concurrently with the Iceberg spec, as that potentially impacts >>>>>>>>> design >>>>>>>>> decision (I think the answer is yes, and we should update the doc with >>>>>>>>> sketches on new endpoints and operations on the endpoints to ensure >>>>>>>>> things >>>>>>>>> align). >>>>>>>>> 2. Going forward should new aspects of Iceberg artifacts rely on >>>>>>>>> the fact that a catalog is present and we can rely on a naming >>>>>>>>> convention >>>>>>>>> for looking up other artifacts in a catalog as pointers (I lean yes on >>>>>>>>> this, but I'm a little bit more ambivalent). >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Micah >>>>>>>>> >>>>>>>>> On Mon, Feb 19, 2024 at 12:52 PM Jack Ye <yezhao...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I suggest we need a step-by-step process to make incremental >>>>>>>>>> consensus, otherwise we are constantly talking about many different >>>>>>>>>> debates >>>>>>>>>> at the same time. >>>>>>>>>> >>>>>>>>>> In my mind, the first key point we all need to agree upon to move >>>>>>>>>> this design forward is*: Do we really want to go with the MV = >>>>>>>>>> view + storage table design approach for Iceberg MV?* >>>>>>>>>> >>>>>>>>>> I think we (at least me) started with this assumption, mostly >>>>>>>>>> because this is how Trino implements MV, and how Hive tables store MV >>>>>>>>>> information today. But does it mean we should design it that way in >>>>>>>>>> Iceberg? >>>>>>>>>> >>>>>>>>>> Now I look back at how we did the view spec design, we could also >>>>>>>>>> say that we just add a representation field in the table spec to >>>>>>>>>> store >>>>>>>>>> view, and an Iceberg view is just a table with no data but with >>>>>>>>>> representations defined. But we did not do that. So it feels now >>>>>>>>>> quite >>>>>>>>>> inconsistent to say we want to just add a few fields in the table >>>>>>>>>> and view >>>>>>>>>> spec to call it an Iceberg MV. >>>>>>>>>> >>>>>>>>>> If we look into most of the other database systems (e.g. >>>>>>>>>> Redshift, BigQuery, Snowflake), they never expose such implementation >>>>>>>>>> details like storage table. Apart from being close-sourced systems, >>>>>>>>>> I think >>>>>>>>>> it is also for good technical reasons. There are many more things >>>>>>>>>> that a >>>>>>>>>> table needs to support, but does not really apply to MV. The MV >>>>>>>>>> internal >>>>>>>>>> structure could evolve in a way that works more efficiently with the >>>>>>>>>> reduced scope of functionalities, without relying on table to offer >>>>>>>>>> the >>>>>>>>>> same capabilities. I can at least say that is true based on my >>>>>>>>>> internal >>>>>>>>>> knowledge of how Redshift MVs work. >>>>>>>>>> >>>>>>>>>> I think we should fully evaluate both directions, and commit to >>>>>>>>>> one first before debating more things. >>>>>>>>>> >>>>>>>>>> If we have a new and independent Iceberg MV spec, then an Iceberg >>>>>>>>>> MV is under-the-hood a single object containing all MV information. >>>>>>>>>> It has >>>>>>>>>> its own name, snapshots, view representation, etc. I don't believe >>>>>>>>>> we will >>>>>>>>>> be blocked by Trino due to its MV SPIs currently requiring the >>>>>>>>>> existence of >>>>>>>>>> a storage table, as it will just be a different implementation from >>>>>>>>>> the >>>>>>>>>> existing one in Trino-Iceberg. In this direction, I don't think we >>>>>>>>>> need to >>>>>>>>>> have any further debate about pointers, metadata locations, storage >>>>>>>>>> table, >>>>>>>>>> etc. because everything will be new. >>>>>>>>>> >>>>>>>>>> If after the evaluation, we are confident that the MV = view + >>>>>>>>>> storage table approach is the right way to go, then we can debate >>>>>>>>>> the other >>>>>>>>>> issues, and I think the next issue to reach consensus should be >>>>>>>>>> "Should the >>>>>>>>>> storage table be registered in the catalog?". >>>>>>>>>> >>>>>>>>>> What do we think? >>>>>>>>>> >>>>>>>>>> -Jack >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Feb 19, 2024 at 11:32 AM Daniel Weeks <dwe...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Jack, >>>>>>>>>>> >>>>>>>>>>> I think we should consider either allowing the storage table to >>>>>>>>>>> be fully exposed/addressable via the catalog or allow access via >>>>>>>>>>> namespacing like with metadata tables. E.g. >>>>>>>>>>> <catalog>.<database>.<table>.<storage>, which would allow for full >>>>>>>>>>> access >>>>>>>>>>> to the underlying table. >>>>>>>>>>> >>>>>>>>>>> For other engines to interact with the storage table (e.g. to >>>>>>>>>>> execute the query to materialize the table), it may be necessary >>>>>>>>>>> that the >>>>>>>>>>> table is fully addressable. Whether the storage table is returned >>>>>>>>>>> as part >>>>>>>>>>> of list operations may be something we leave up to the catalog >>>>>>>>>>> implementation. >>>>>>>>>>> >>>>>>>>>>> I don't think the table should reference a physical location >>>>>>>>>>> (only a logical reference) since things will be changing behind the >>>>>>>>>>> view >>>>>>>>>>> definition and I'm not confident we want to have to update the view >>>>>>>>>>> representation everytime the storage table is updated. >>>>>>>>>>> >>>>>>>>>>> I think there's still some exploration as to whether we need to >>>>>>>>>>> model this as separate from view endpoints, but there may be enough >>>>>>>>>>> overlap >>>>>>>>>>> that it's not necessary to have yet another set of endpoints for >>>>>>>>>>> materialized views (maybe filter params if you need to >>>>>>>>>>> distinguish?). >>>>>>>>>>> >>>>>>>>>>> -Dan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sun, Feb 18, 2024 at 6:57 PM Renjie Liu < >>>>>>>>>>> liurenjie2...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, Jack: >>>>>>>>>>>> >>>>>>>>>>>> Thanks for raising this. >>>>>>>>>>>> >>>>>>>>>>>> In most database systems, MV, view and table are considered >>>>>>>>>>>>> independent objects, at least at API level. It is very rare for a >>>>>>>>>>>>> system to >>>>>>>>>>>>> support operations like "materializing a logical view" or >>>>>>>>>>>>> "upgrading a >>>>>>>>>>>>> logical view to MV", because view and MV are very different in >>>>>>>>>>>>> almost every >>>>>>>>>>>>> aspect of user experience. Extending the existing view or table >>>>>>>>>>>>> spec to >>>>>>>>>>>>> accommodate MV might give us a MV implementation similar to the >>>>>>>>>>>>> current >>>>>>>>>>>>> Trino or Hive views, save us some effort and a few APIs in REST, >>>>>>>>>>>>> but it >>>>>>>>>>>>> binds us to a very specific design of MV, which we might regret >>>>>>>>>>>>> in the >>>>>>>>>>>>> future. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> When I reviewed the doc, I thought we were discussing the spec >>>>>>>>>>>> of materialized view, just like the spec of table metadata, but >>>>>>>>>>>> didn't not >>>>>>>>>>>> the user facing api. I would definitely agree that we should >>>>>>>>>>>> consider MV as >>>>>>>>>>>> another kind of database object in user facing api, even though >>>>>>>>>>>> it's >>>>>>>>>>>> internally modelled as a view + storage table pointer. >>>>>>>>>>>> >>>>>>>>>>>> If we want to make the REST experience good for MV, I think we >>>>>>>>>>>>> should at least consider directly describing the full metadata of >>>>>>>>>>>>> the >>>>>>>>>>>>> storage table in Iceberg view, instead of pointing to a JSON file. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Do you mean we need to add components like >>>>>>>>>>>> `LoadMaterializedViewResponse`, if so, I would +1 for this. >>>>>>>>>>>> >>>>>>>>>>>> *Q2: what REST APIs do we expect to use for interactions with >>>>>>>>>>>>> MVs?* >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> As I have mentioned above, I think we should consider MV as >>>>>>>>>>>> another database object, so I think we should add a set of apis >>>>>>>>>>>> specifically designed for MV, such as `loadMV`, `freshMV`. >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Feb 17, 2024 at 11:14 AM Jack Ye <yezhao...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>> >>>>>>>>>>>>> As we are discussing the spec change for materialized view, >>>>>>>>>>>>> there has been a missing aspect that is technically also related, >>>>>>>>>>>>> and might >>>>>>>>>>>>> affect the MV spec design: *how do we want to add MV support >>>>>>>>>>>>> to the REST spec?* >>>>>>>>>>>>> >>>>>>>>>>>>> I would like to discuss this in a new thread to collect >>>>>>>>>>>>> people's thoughts. This topic expands to the following 2 >>>>>>>>>>>>> sub-questions: >>>>>>>>>>>>> >>>>>>>>>>>>> *Q1: how would the MV spec change affect the REST spec?* >>>>>>>>>>>>> In the current proposal, it looks like we are using a design >>>>>>>>>>>>> where a MV is modeled as an Iceberg view linking to an Iceberg >>>>>>>>>>>>> storage >>>>>>>>>>>>> table. At the same time, we do not want to expose this storage >>>>>>>>>>>>> table in the >>>>>>>>>>>>> catalog, thus the Iceberg view has a pointer to only a metadata >>>>>>>>>>>>> JSON file >>>>>>>>>>>>> of the Iceberg storage table. Each MV refresh updates the pointer >>>>>>>>>>>>> to a new >>>>>>>>>>>>> metadata JSON file. >>>>>>>>>>>>> >>>>>>>>>>>>> I feel this does not play very well with the direction that >>>>>>>>>>>>> REST is going. The REST catalog is trying to remove the >>>>>>>>>>>>> dependency to the >>>>>>>>>>>>> metadata JSON file. For example, in LoadTableResponse the only >>>>>>>>>>>>> required >>>>>>>>>>>>> field is the metadata, and metadata-location is actually optional. >>>>>>>>>>>>> >>>>>>>>>>>>> If we want to make the REST experience good for MV, I think we >>>>>>>>>>>>> should at least consider directly describing the full metadata of >>>>>>>>>>>>> the >>>>>>>>>>>>> storage table in Iceberg view, instead of pointing to a JSON file. >>>>>>>>>>>>> >>>>>>>>>>>>> *Q2: what REST APIs do we expect to use for interactions with >>>>>>>>>>>>> MVs?* >>>>>>>>>>>>> So far we have been thinking about amending the view spec to >>>>>>>>>>>>> accommodate MV. This entails likely having MVs also being handled >>>>>>>>>>>>> through >>>>>>>>>>>>> the view APIs in REST spec. >>>>>>>>>>>>> >>>>>>>>>>>>> We need to agree with that first in the community, because >>>>>>>>>>>>> this has various implications, and I am not really sure at this >>>>>>>>>>>>> point if it >>>>>>>>>>>>> is the best way to go. >>>>>>>>>>>>> >>>>>>>>>>>>> If MV interactions are through the view APIs, the view APIs >>>>>>>>>>>>> need to be updated to accommodate MV constructs that are not >>>>>>>>>>>>> really related >>>>>>>>>>>>> to logical views. In fact, most actions performed on MVs are more >>>>>>>>>>>>> similar >>>>>>>>>>>>> to actions performed on table rather than view, which involve >>>>>>>>>>>>> configuring >>>>>>>>>>>>> data layout, read and write constructs. For example, users might >>>>>>>>>>>>> run >>>>>>>>>>>>> something like: >>>>>>>>>>>>> >>>>>>>>>>>>> CREATE MATERIALIZED VIEW mv >>>>>>>>>>>>> PARTITION BY col1 >>>>>>>>>>>>> CLUSTER BY col2 >>>>>>>>>>>>> AS ( // some sql ) >>>>>>>>>>>>> >>>>>>>>>>>>> then the CreateView API needs to accept partition spec and >>>>>>>>>>>>> sort order that are completely not relevant for logical views. >>>>>>>>>>>>> >>>>>>>>>>>>> When reading a MV, we might even want to have a >>>>>>>>>>>>> PlanMaterializedView API similar to the PlanTable API we are >>>>>>>>>>>>> adding. >>>>>>>>>>>>> >>>>>>>>>>>>> *My personal take* >>>>>>>>>>>>> It feels like we need to reconsider the question of what is >>>>>>>>>>>>> the best way to model MV in Iceberg. Should it be (1) a view >>>>>>>>>>>>> linked to a >>>>>>>>>>>>> storage table, or (2) a table with a view SQL associated with it, >>>>>>>>>>>>> or (3) >>>>>>>>>>>>> it's a completely independent thing. This topic was discussed in >>>>>>>>>>>>> the past in >>>>>>>>>>>>> this doc >>>>>>>>>>>>> <https://docs.google.com/document/d/1QAuy-meSZ6Oy37iPym8sV_n7R2yKZOHunVR-ZWhhZ6Q/edit?pli=1>, >>>>>>>>>>>>> but at that time we did not have much perspective about aspects >>>>>>>>>>>>> like REST >>>>>>>>>>>>> spec, and the view integration was also not fully completed yet. >>>>>>>>>>>>> With the >>>>>>>>>>>>> new knowledge, currently I am actually leaning a bit more towards >>>>>>>>>>>>> (3). >>>>>>>>>>>>> >>>>>>>>>>>>> In most database systems, MV, view and table are considered >>>>>>>>>>>>> independent objects, at least at API level. It is very rare for a >>>>>>>>>>>>> system to >>>>>>>>>>>>> support operations like "materializing a logical view" or >>>>>>>>>>>>> "upgrading a >>>>>>>>>>>>> logical view to MV", because view and MV are very different in >>>>>>>>>>>>> almost every >>>>>>>>>>>>> aspect of user experience. Extending the existing view or table >>>>>>>>>>>>> spec to >>>>>>>>>>>>> accommodate MV might give us a MV implementation similar to the >>>>>>>>>>>>> current >>>>>>>>>>>>> Trino or Hive views, save us some effort and a few APIs in REST, >>>>>>>>>>>>> but it >>>>>>>>>>>>> binds us to a very specific design of MV, which we might regret >>>>>>>>>>>>> in the >>>>>>>>>>>>> future. >>>>>>>>>>>>> >>>>>>>>>>>>> If we make a new MV spec, it can be made up of fields that >>>>>>>>>>>>> already exist in the table and view specs, but it is a whole new >>>>>>>>>>>>> spec. In >>>>>>>>>>>>> this way, the spec can evolve independently to accommodate MV >>>>>>>>>>>>> specific >>>>>>>>>>>>> features, and we can also create MV-related REST endpoints that >>>>>>>>>>>>> will evolve >>>>>>>>>>>>> independently from table and view REST APIs. >>>>>>>>>>>>> >>>>>>>>>>>>> But on the other side it is definitely associated with more >>>>>>>>>>>>> work to maintain a new spec, and potentially big refactoring in >>>>>>>>>>>>> the >>>>>>>>>>>>> codebase to make sure operations today that work on table or view >>>>>>>>>>>>> can now >>>>>>>>>>>>> support MV as a different object. And it definitely has other >>>>>>>>>>>>> problems that >>>>>>>>>>>>> I have overlooked. I would greatly appreciate any thoughts about >>>>>>>>>>>>> this! >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Jack Ye >>>>>>>>>>>>> >>>>>>>>>>>>>