Hi everyone,

As we are discussing the spec change for materialized view, there has been
a missing aspect that is technically also related, and might affect the MV
spec design: *how do we want to add MV support to the REST spec?*

I would like to discuss this in a new thread to collect people's thoughts.
This topic expands to the following 2 sub-questions:

*Q1: how would the MV spec change affect the REST spec?*
In the current proposal, it looks like we are using a design where a MV is
modeled as an Iceberg view linking to an Iceberg storage table. At the same
time, we do not want to expose this storage table in the catalog, thus the
Iceberg view has a pointer to only a metadata JSON file of the Iceberg
storage table. Each MV refresh updates the pointer to a new metadata JSON
file.

I feel this does not play very well with the direction that REST is going.
The REST catalog is trying to remove the dependency to the metadata JSON
file. For example, in LoadTableResponse the only required field is the
metadata, and metadata-location is actually optional.

If we want to make the REST experience good for MV, I think we should at
least consider directly describing the full metadata of the storage table
in Iceberg view, instead of pointing to a JSON file.

*Q2: what REST APIs do we expect to use for interactions with MVs?*
So far we have been thinking about amending the view spec to accommodate
MV. This entails likely having MVs also being handled through the view APIs
in REST spec.

We need to agree with that first in the community, because this has various
implications, and I am not really sure at this point if it is the best way
to go.

If MV interactions are through the view APIs, the view APIs need to be
updated to accommodate MV constructs that are not really related to logical
views. In fact, most actions performed on MVs are more similar to actions
performed on table rather than view, which involve configuring data layout,
read and write constructs. For example, users might run something like:

CREATE MATERIALIZED VIEW mv
PARTITION BY col1
CLUSTER BY col2
AS ( // some sql )

then the CreateView API needs to accept partition spec and sort order that
are completely not relevant for logical views.

When reading a MV, we might even want to have a PlanMaterializedView API
similar to the PlanTable API we are adding.

*My personal take*
It feels like we need to reconsider the question of what is the best way to
model MV in Iceberg. Should it be (1) a view linked to a storage table, or
(2) a table with a view SQL associated with it, or (3) it's a completely
independent thing. This topic was discussed in the past in this doc
<https://docs.google.com/document/d/1QAuy-meSZ6Oy37iPym8sV_n7R2yKZOHunVR-ZWhhZ6Q/edit?pli=1>,
but at that time we did not have much perspective about aspects like REST
spec, and the view integration was also not fully completed yet. With the
new knowledge, currently I am actually leaning a bit more towards (3).

In most database systems, MV, view and table are considered independent
objects, at least at API level. It is very rare for a system to support
operations like "materializing a logical view" or "upgrading a logical view
to MV", because view and MV are very different in almost every aspect of
user experience. Extending the existing view or table spec to accommodate
MV might give us a MV implementation similar to the current Trino or Hive
views, save us some effort and a few APIs in REST, but it binds us to a
very specific design of MV, which we might regret in the future.

If we make a new MV spec, it can be made up of fields that already exist in
the table and view specs, but it is a whole new spec. In this way, the spec
can evolve independently to accommodate MV specific features, and we can
also create MV-related REST endpoints that will evolve independently from
table and view REST APIs.

But on the other side it is definitely associated with more work to
maintain a new spec, and potentially big refactoring in the codebase to
make sure operations today that work on table or view can now support MV as
a different object. And it definitely has other problems that I have
overlooked. I would greatly appreciate any thoughts about this!

Best,
Jack Ye

Reply via email to