Hi, Jack:

Thanks for raising this.

In most database systems, MV, view and table are considered independent
> objects, at least at API level. It is very rare for a system to support
> operations like "materializing a logical view" or "upgrading a logical view
> to MV", because view and MV are very different in almost every aspect of
> user experience. Extending the existing view or table spec to accommodate
> MV might give us a MV implementation similar to the current Trino or Hive
> views, save us some effort and a few APIs in REST, but it binds us to a
> very specific design of MV, which we might regret in the future.


When I reviewed the doc, I thought we were discussing the spec of
materialized view, just like the spec of table metadata, but didn't not the
user facing api. I would definitely agree that we should consider MV as
another kind of database object in user facing api, even though it's
internally modelled as a view + storage table pointer.

If we want to make the REST experience good for MV, I think we should at
> least consider directly describing the full metadata of the storage table
> in Iceberg view, instead of pointing to a JSON file.


Do you mean we need to add components like `LoadMaterializedViewResponse`,
if so, I would +1 for this.

*Q2: what REST APIs do we expect to use for interactions with MVs?*


As I have mentioned above,  I think we should consider MV as another
database object, so I think we should add a set of apis specifically
designed for MV, such as `loadMV`, `freshMV`.

On Sat, Feb 17, 2024 at 11:14 AM Jack Ye <yezhao...@gmail.com> wrote:

> Hi everyone,
>
> As we are discussing the spec change for materialized view, there has been
> a missing aspect that is technically also related, and might affect the MV
> spec design: *how do we want to add MV support to the REST spec?*
>
> I would like to discuss this in a new thread to collect people's thoughts.
> This topic expands to the following 2 sub-questions:
>
> *Q1: how would the MV spec change affect the REST spec?*
> In the current proposal, it looks like we are using a design where a MV is
> modeled as an Iceberg view linking to an Iceberg storage table. At the same
> time, we do not want to expose this storage table in the catalog, thus the
> Iceberg view has a pointer to only a metadata JSON file of the Iceberg
> storage table. Each MV refresh updates the pointer to a new metadata JSON
> file.
>
> I feel this does not play very well with the direction that REST is going.
> The REST catalog is trying to remove the dependency to the metadata JSON
> file. For example, in LoadTableResponse the only required field is the
> metadata, and metadata-location is actually optional.
>
> If we want to make the REST experience good for MV, I think we should at
> least consider directly describing the full metadata of the storage table
> in Iceberg view, instead of pointing to a JSON file.
>
> *Q2: what REST APIs do we expect to use for interactions with MVs?*
> So far we have been thinking about amending the view spec to accommodate
> MV. This entails likely having MVs also being handled through the view APIs
> in REST spec.
>
> We need to agree with that first in the community, because this has
> various implications, and I am not really sure at this point if it is the
> best way to go.
>
> If MV interactions are through the view APIs, the view APIs need to be
> updated to accommodate MV constructs that are not really related to logical
> views. In fact, most actions performed on MVs are more similar to actions
> performed on table rather than view, which involve configuring data layout,
> read and write constructs. For example, users might run something like:
>
> CREATE MATERIALIZED VIEW mv
> PARTITION BY col1
> CLUSTER BY col2
> AS ( // some sql )
>
> then the CreateView API needs to accept partition spec and sort order that
> are completely not relevant for logical views.
>
> When reading a MV, we might even want to have a PlanMaterializedView API
> similar to the PlanTable API we are adding.
>
> *My personal take*
> It feels like we need to reconsider the question of what is the best way
> to model MV in Iceberg. Should it be (1) a view linked to a storage table,
> or (2) a table with a view SQL associated with it, or (3) it's a completely
> independent thing. This topic was discussed in the past in this doc
> <https://docs.google.com/document/d/1QAuy-meSZ6Oy37iPym8sV_n7R2yKZOHunVR-ZWhhZ6Q/edit?pli=1>,
> but at that time we did not have much perspective about aspects like REST
> spec, and the view integration was also not fully completed yet. With the
> new knowledge, currently I am actually leaning a bit more towards (3).
>
> In most database systems, MV, view and table are considered independent
> objects, at least at API level. It is very rare for a system to support
> operations like "materializing a logical view" or "upgrading a logical view
> to MV", because view and MV are very different in almost every aspect of
> user experience. Extending the existing view or table spec to accommodate
> MV might give us a MV implementation similar to the current Trino or Hive
> views, save us some effort and a few APIs in REST, but it binds us to a
> very specific design of MV, which we might regret in the future.
>
> If we make a new MV spec, it can be made up of fields that already exist
> in the table and view specs, but it is a whole new spec. In this way, the
> spec can evolve independently to accommodate MV specific features, and we
> can also create MV-related REST endpoints that will evolve independently
> from table and view REST APIs.
>
> But on the other side it is definitely associated with more work to
> maintain a new spec, and potentially big refactoring in the codebase to
> make sure operations today that work on table or view can now support MV as
> a different object. And it definitely has other problems that I have
> overlooked. I would greatly appreciate any thoughts about this!
>
> Best,
> Jack Ye
>
>

Reply via email to