Hi, Jack: Thanks for raising this.
In most database systems, MV, view and table are considered independent > objects, at least at API level. It is very rare for a system to support > operations like "materializing a logical view" or "upgrading a logical view > to MV", because view and MV are very different in almost every aspect of > user experience. Extending the existing view or table spec to accommodate > MV might give us a MV implementation similar to the current Trino or Hive > views, save us some effort and a few APIs in REST, but it binds us to a > very specific design of MV, which we might regret in the future. When I reviewed the doc, I thought we were discussing the spec of materialized view, just like the spec of table metadata, but didn't not the user facing api. I would definitely agree that we should consider MV as another kind of database object in user facing api, even though it's internally modelled as a view + storage table pointer. If we want to make the REST experience good for MV, I think we should at > least consider directly describing the full metadata of the storage table > in Iceberg view, instead of pointing to a JSON file. Do you mean we need to add components like `LoadMaterializedViewResponse`, if so, I would +1 for this. *Q2: what REST APIs do we expect to use for interactions with MVs?* As I have mentioned above, I think we should consider MV as another database object, so I think we should add a set of apis specifically designed for MV, such as `loadMV`, `freshMV`. On Sat, Feb 17, 2024 at 11:14 AM Jack Ye <yezhao...@gmail.com> wrote: > Hi everyone, > > As we are discussing the spec change for materialized view, there has been > a missing aspect that is technically also related, and might affect the MV > spec design: *how do we want to add MV support to the REST spec?* > > I would like to discuss this in a new thread to collect people's thoughts. > This topic expands to the following 2 sub-questions: > > *Q1: how would the MV spec change affect the REST spec?* > In the current proposal, it looks like we are using a design where a MV is > modeled as an Iceberg view linking to an Iceberg storage table. At the same > time, we do not want to expose this storage table in the catalog, thus the > Iceberg view has a pointer to only a metadata JSON file of the Iceberg > storage table. Each MV refresh updates the pointer to a new metadata JSON > file. > > I feel this does not play very well with the direction that REST is going. > The REST catalog is trying to remove the dependency to the metadata JSON > file. For example, in LoadTableResponse the only required field is the > metadata, and metadata-location is actually optional. > > If we want to make the REST experience good for MV, I think we should at > least consider directly describing the full metadata of the storage table > in Iceberg view, instead of pointing to a JSON file. > > *Q2: what REST APIs do we expect to use for interactions with MVs?* > So far we have been thinking about amending the view spec to accommodate > MV. This entails likely having MVs also being handled through the view APIs > in REST spec. > > We need to agree with that first in the community, because this has > various implications, and I am not really sure at this point if it is the > best way to go. > > If MV interactions are through the view APIs, the view APIs need to be > updated to accommodate MV constructs that are not really related to logical > views. In fact, most actions performed on MVs are more similar to actions > performed on table rather than view, which involve configuring data layout, > read and write constructs. For example, users might run something like: > > CREATE MATERIALIZED VIEW mv > PARTITION BY col1 > CLUSTER BY col2 > AS ( // some sql ) > > then the CreateView API needs to accept partition spec and sort order that > are completely not relevant for logical views. > > When reading a MV, we might even want to have a PlanMaterializedView API > similar to the PlanTable API we are adding. > > *My personal take* > It feels like we need to reconsider the question of what is the best way > to model MV in Iceberg. Should it be (1) a view linked to a storage table, > or (2) a table with a view SQL associated with it, or (3) it's a completely > independent thing. This topic was discussed in the past in this doc > <https://docs.google.com/document/d/1QAuy-meSZ6Oy37iPym8sV_n7R2yKZOHunVR-ZWhhZ6Q/edit?pli=1>, > but at that time we did not have much perspective about aspects like REST > spec, and the view integration was also not fully completed yet. With the > new knowledge, currently I am actually leaning a bit more towards (3). > > In most database systems, MV, view and table are considered independent > objects, at least at API level. It is very rare for a system to support > operations like "materializing a logical view" or "upgrading a logical view > to MV", because view and MV are very different in almost every aspect of > user experience. Extending the existing view or table spec to accommodate > MV might give us a MV implementation similar to the current Trino or Hive > views, save us some effort and a few APIs in REST, but it binds us to a > very specific design of MV, which we might regret in the future. > > If we make a new MV spec, it can be made up of fields that already exist > in the table and view specs, but it is a whole new spec. In this way, the > spec can evolve independently to accommodate MV specific features, and we > can also create MV-related REST endpoints that will evolve independently > from table and view REST APIs. > > But on the other side it is definitely associated with more work to > maintain a new spec, and potentially big refactoring in the codebase to > make sure operations today that work on table or view can now support MV as > a different object. And it definitely has other problems that I have > overlooked. I would greatly appreciate any thoughts about this! > > Best, > Jack Ye > >