Re: Dedicated sync for Iceberg materialized view

Benny Chow Thu, 23 Oct 2025 22:44:30 -0700

Hi Peter

I think the current proposal would support your example. In most situations, replace table operations after a view is materialized wouldn’t invalidate the materialization. However, if the view includes metadata columns, then the replace operations should invalidate the materialization.

This also brings up another important point that engines will differ on what views can be materialized or not. For example, maybe metadata columns are not allowed similar to non deterministic functions like random. But some engines like Dremio may allow views that use current date functions. It should be possible for one engine to materialize a view and another engine to look at the query tree and decide it’s not a view it supports materializations on and choose not to use that materialization.

Thanks

Benny

On Oct 23, 2025, at 8:44 AM, Péter Váry <[email protected]> wrote:

Hi All,

I’ve been catching up on the discussion and wanted to share an observation. One aspect that stands out to me in the proposed staleness evaluation logic is that snapshots which don’t modify data can still affect the view’s contents if the view includes metadata columns.

I was considering using a materialized view as an index for a given table to accelerate the conversion of equality deletes to position deletes. For example, the query might look like:

SELECT _POS, _FILE, id FROM target_table

During compaction, the materialized view would need to be refreshed to ensure it reflects the correct data.

Does this seem like a valid use case? Or should we explicitly exclude scenarios like this?

Thanks,
Peter

Steven Wu <[email protected]> ezt írta (időpont: 2025. okt. 20., H, 17:30):
Walaa,

> while Option 2 is described in your summary as "giving engines flexibility to determine freshness recursively beyond a source MV", that isn’t achievable under the MV evaluation model itself.
Because each MV treats upstream MVs as physical tables, recursion stops at the first materialized boundary; deeper staleness cannot be discovered without switching to a logical-view evaluation model, i.e., stepping outside the MV model altogether (note that in Option 3 we can determine recursive staleness while still inside the MV model).

In option 2, when determining the freshness of mv_3, engines can choose to recursively evaluate the freshness of mv_1 and mv_2 since they are also MVs. But engines can also choose not to.

> This means that there seems to be an implicit “Option 3”. This option treats MVs as logical views, i.e., storing only view versions + base table snapshot IDs (no MV storage snapshot IDs, no per-path lineage).

In the new option 3 you described, how could the engine update mv3's refresh state for base table_a and table_b? unless all connected MVs are refreshed and committed in one single transaction, one entry per base table doesn't seem feasible. That's the main reason for option 1 to require the lineage path information in refresh state for base tables.

It also seems that option 3 can only interpret freshness recursively, while today there are engines that support MVs without recursively evaluating source MVs.

Thanks,
Steven

On Mon, Oct 20, 2025 at 1:44 AM Walaa Eldin Moustafa <[email protected]> wrote:
Hi Steven,

Thanks for organizing the series and summarizing the outcome.

After re-reading the Option 1/2 proposal, initially I interpreted Option 1 as simply expanding MVs like regular logical views. On closer look, it is actually more complex. It also preserves per-path lineage state (e.g., multiple entries for the same base table via different parents), which increases expressiveness but significantly increases metadata complexity. So I agree it is not a practical option.

This means that there seems to be an implicit “Option 3”. This option treats MVs as logical views, i.e., storing only view versions + base table snapshot IDs (no MV storage snapshot IDs, no per-path lineage). Under this model, mv_3’s metadata might look like:

Type Name Tracked State
----- ------- -----------------------
view mv_1 view_version_id
view mv_2 view_version_id
table table_a table_snapshot_id
table table_b table_snapshot_id

This preserves logical semantics and aligns MV behavior with pure views.

If we choose Option 2 (treat source MV as a materialized table), we may have to be consider those constraints:

* Staleness only degrades up the chain. mv_1 and mv_2 may already be stale relative to the base tables, but if mv_3 is refreshed using their storage snapshots, then mv_3 will be marked as fresh under Option 2, even though all three MVs are stale relative to the base tables.

* Engines can no longer discover staleness beyond mv_1. Once mv_3 sees mv_1 (or mv_2) as fresh based only on their storage snapshots, it will not expand into mv_1 or mv_2 to check whether they are stale relative to the base tables.

* If mv_2 and mv_3 were purely logical views instead of MVs, they would evaluate directly against base tables and return newer data. Under Option 2, the same definitions but materialized upstream produce different data, not just different metadata.

Therefore, while Option 2 is described in your summary as "giving engines flexibility to determine freshness recursively beyond a source MV", that isn’t achievable under the MV evaluation model itself.
Because each MV treats upstream MVs as physical tables, recursion stops at the first materialized boundary; deeper staleness cannot be discovered without switching to a logical-view evaluation model, i.e., stepping outside the MV model altogether (note that in Option 3 we can determine recursive staleness while still inside the MV model).

Let me know your thoughts. I slightly prefer Option 3. I’m also fine with Option 2, but I don’t think the flexibility to recursively determine freshness actually exists under its evaluation model. Not sure if this changes anyone’s view, but I wanted to clarify how I’m reading it.

Thanks,
Walaa.

On Wed, Oct 8, 2025 at 11:11 PM Benny Chow <[email protected]> wrote:
I just listened to the recording. I'm the tech lead for MVs at Dremio and responsible for both refresh management and query rewrites with MVs.

It's great that we seem to agree that Iceberg MV spec won't require that MVs always be up to date in order to be usable for query rewrites. There can be many data consistency issues (as Dan pointed out) but that is the state of affairs today.

It sounds like we are converging on the following scenarios for an engine to validate the MV freshness:

1. Use storage table without any validation. This might be the extreme "async MV" example.
2. Ignore storage table even if one exists because SQL command or use case requires that.
3. Use storage table only if data is not more than x hours old. This can be achieved with the proposed refresh-start-timestamp-ms which is currently in the proposed spec. For this to work with MVs built on MVs, we should probably state in the spec that if a MV is built on another MV, then it needs to inherit the refresh-start-timestamp-ms of the child MV. In Steven's example, when building mv3, refresh-start-timestamp-ms needs to be set to the minimum of mv1 or mv2's refresh-start-timestamp-ms. If this property name is confusing, we can rename it to "refresh-earliest-table-timestamp-ms". I originally proposed this property and also listed out other benefits here:  https://github.com/apache/iceberg/pull/11041#discussion_r1779797796 Also, at the time, MVs built on MVs weren't being considered. Now that it is, I would recommend we have both "refresh-start-timestamp-ms" (when the refresh was started on the storage table) and "refresh-earliest-table-timestamp-ms" (used for freshness validation).
4. Don't use the storage table if it is older than X hours. This is what I had originally proposed for the materialization.max-stalessness-ms view property here:  https://github.com/apache/iceberg/pull/11041#discussion_r1744837644 It wasn't meant to validate the freshness but more to prevent use of a materialization after some criteria.
5. Use storage table if recursive validation passes... i.e. refresh-state matches the current expanded query tree state. This is what I think Steven is calling the "synchronous MV".

For scenario 1-4, it would support the nice use case of an Iceberg client using a view's data through the storage table without needing to know how to parse/validate/expand any view SQLs.

In Dremio's planner, we primarily use scenario 1 and 4 together to determine MV validity for query rewrite. Scenario 2 and 5 also apply in certain situations. For scenario 3, Dremio only exposes the "refresh-earliest-table-timestamp-ms" as an fyi to the user but it would be interesting to allow the user to set this time so that they could run queries and be 100% certain that they were not seeing data older than x hours.

Thanks
Benny

On Wed, Oct 8, 2025 at 3:37 PM Steven Wu <[email protected]> wrote:
correction for a typo.

Prashanth brought up another scenario of compaction/rewrite where a new snapshot was added with actual data change
-->
Prashanth brought up another scenario of compaction/rewrite where a new snapshot was added without actual data change

On Wed, Oct 8, 2025 at 2:12 PM Steven Wu <[email protected]> wrote:
Hi,

Thanks everyone for joining the MV discussion meeting. We will continue to have the recurring sync meeting on Wednesday 9 am (Pacific) every 3 weeks until we get to the finish line where Jan's MV spec PR [1] is merged. I have scheduled our next meeting on Oct 29 in the Iceberg dev events calendar.

Here is the video recording for today's meeting.
https://drive.google.com/file/d/1-nfhBPDWLoAFDu5cKP0rwLd_30HB6byR/view?usp=sharing

We mostly discussed freshness evaluation. Here is the meeting summary.
For tracking the refresh state for the source MV [2], the consensus is option 2 (treating source MV as a materialized table) which would give engines the flexibility on freshness determination (recursive beyond source MV or not).
Earlier design doc [3] discussed max staleness config. But it wasn't reflected in the spec PR. The general opinion is to add the config to the spec PR. The open question is whether the `materialization.max-staleness-ms` config should be added to the view metadata or the storage table metadata. Either can work. We just need to decide which makes a little better fit.
Prashanth brought up schema change with default value and how it may affect the MV refresh state (for SQL representation with select *). Jan mentioned that snapshot contains schema id when the snapshot was created. Engine can compare the snapshot schema id to the source table schema id during freshness evaluation. There is no need for additional schema info in refresh-state tracking in the storage table.
Prashanth brought up another scenario of compaction/rewrite where a new snapshot was added with actual data change. The general take is that the engine can optimize and decide that MV is fresh as the new snapshot doesn't have any data change.

We can add some clarifications in the spec PR for freshness evaluation based on the above discussions.

[1] https://github.com/apache/iceberg/pull/11041
[2] https://docs.google.com/document/d/1_StBW5hCQhumhIvgbdsHjyW0ED3dWMkjtNzyPp9Sfr8/edit?tab=t.0
[3] https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?tab=t.0#heading=h.3wigecex0zls

On Thu, Sep 25, 2025 at 9:27 AM Steven Wu <[email protected]> wrote:
Hi all,

Iceberg materialized view has been discussed in the community for a long time. Thanks Jan Kaul for driving the discussion and the spec PR. It has been stalled for a long time due to lack of consensus on 1 or 2 topics. In Wed's Iceberg community sync meeting, Talat brought up the question on how to move forward and if we can have a dedicated meeting for MV.

I have set up a meeting on Oct 8 (9-10 am Pacific). If you subscribe to the "Iceberg Dev Events" calendar, you should be able to see it. If not, here is the link: https://meet.google.com/nfe-guyq-pqf

We are going to discuss
* remaining open questions
* unresolved concerns
* the next step and hopefully some consensus on moving forward

MV spec PR is up to date. Jan has incorporated recent feedback. This should be the base of the discussion.
https://github.com/apache/iceberg/pull/11041

Dev discussion thread (a long-running thread started by Jan).
https://lists.apache.org/thread/y1vlpzbn2x7xookjkffcl08zzyofk5hf

The mail archive has broken lineage and doesn't show all replies. Email subject is "[DISCUSS] Iceberg Materialzied Views".

Thanks,
Steven

Re: Dedicated sync for Iceberg materialized view

Reply via email to