Re: Dedicated sync for Iceberg materialized view

Steven Wu Mon, 08 Dec 2025 14:54:05 -0800

Just a reminder for the MV community sync on Wednesday, Dec 10. Hope to see
you there.


Here are some topics

   -

   Revisit the source MV handling based on the recent comments in this
   thread.
   - Review the latest spec PR
   <https://github.com/apache/iceberg/pull/11041> update from Jan


On Fri, Dec 5, 2025 at 2:36 PM Steven Wu <[email protected]> wrote:

> Walaa/Benny, thanks a lot for replies. I agree with some of the key points
> * smart producers and dumb consumers
> * producer ... use two flat lists of source views and source tables
> * Freshness evaluation becomes inconsistent
> * recursion for freshness does not align with recursion for refresh
>
> Maybe we can revise the design while still allowing the flexibility for
> source MVs
> * Producers decide recursive or not for staleness evaluation semantics
> (maybe based on the refresh strategy)
>   - recursive: store the upstream views and tables of source MVs - expand
> beyond the source MV
>   - non-recursive: only store the source MV's storage table (in addition
> to the source MV's view) - *not* expanding beyond source MV
> * Consumes only evaluate the two flat lists of source-view-state and
> source-table-state, which make the consumer logic simpler and consistent.
>
> Typically, there is only one producer (MV refresh engine). If we are
> concerned about multiple producers, we may introduce some config for
> recursive source MV semantics or not.
>
> > For max-staleness, I think it should strictly apply to only source
> tables and not storage tables.  For the ETL pipeline use case, the consumer
> is probably not going to care about this property.
>
> With the above revision, the max-staleness can still be applied to storage
> tables. Basically, consumer just evaluate the staleness based on what the
> producer decides to put in the refresh-state
>
> > we must consider the scenario of shared upstream tables (the "diamond
> pattern"). Specifically:
> > 1. Do we allow duplicate table entries in the list (e.g., if the same
> table was used at different snapshots across different refresh path
> traversals)?
> > 2. Would we need to include the path in the refresh state entry to make
> this data interpretable?
>
> For the diamond shape pattern
> <https://docs.google.com/document/d/1_StBW5hCQhumhIvgbdsHjyW0ED3dWMkjtNzyPp9Sfr8/edit?tab=t.0>,
> we probably need to include the *optional* path in the refresh state.
> duplicate entries are difficult for human troubleshooting. If there is no
> common ancestor, the path can be omitted.
>
>
> On Fri, Dec 5, 2025 at 1:53 PM Benny Chow <[email protected]> wrote:
>
>> >> Benny: Are you suggesting that the source-table-states should only
>> capture the leaf table nodes in the MV dependency pipeline?
>> Yes.  But to be clear with an example, suppose you have a MV like:
>>
>> CREATE MV MV1 as SELECT * FROM T1 UNION ALL SELECT * FROM V1
>>
>> And suppose V1 was defined as CREATE MV V1 as SELECT * FROM T1  --- Yes,
>> T1 again to make this example interesting.
>>
>> Then, I'm saying that the source-table-state for MV1 is going to somehow
>> combine the first T1 with the source table state from V1.
>>
>> On Fri, Dec 5, 2025 at 1:25 PM Igor Belianski <[email protected]>
>> wrote:
>>
>>> Hi Benny and Walaa,
>>>
>>> Could you please clarify the following statement from Benny's last email:
>>>
>>> "We should avoid the need for consumers to expand nested MVs. I think
>>> the producer should be combining the refresh states of all the nested MVs
>>> it uses into two flat lists of source views and source tables. These source
>>> tables can't contain storage tables."
>>>
>>> Benny: Are you suggesting that the source-table-states should only
>>> capture the leaf table nodes in the MV dependency pipeline?
>>>
>>> Walaa: If we completely enumerate all source tables recursively, we must
>>> consider the scenario of shared upstream tables (the "diamond pattern").
>>> Specifically:
>>>
>>> 1. Do we allow duplicate table entries in the list (e.g., if the same
>>> table was used at different snapshots across different refresh path
>>> traversals)?
>>> 2. Would we need to include the path in the refresh state entry to make
>>> this data interpretable?
>>>
>>> If we pursue the option of listing everything in the tree, we should
>>> choose between:
>>>
>>> - A) Permissive: Allow duplicate table entries, treating the list as a
>>> client hint for tables to check. This leaves it up to engines to
>>> disambiguate or skip entries, and the list may not be strictly exhaustive.
>>> - B) Prescriptive: Establish an exactly defined meaning for each entry,
>>> mandating clear rules for aggregation.
>>>
>>> I am highly hesitant to mandate option B ( it would obviously be too
>>> prescriptive for most engines).
>>>
>>> Thanks,
>>> Igor
>>>
>>> On Thu, Dec 4, 2025 at 9:28 PM Benny Chow <[email protected]> wrote:
>>>
>>>> I agree with Walaa.  In the last sync, we talked about smart producers
>>>> and dumb consumers.  We should avoid the need for consumers to expand
>>>> nested MVs.  I think the producer should be combining the refresh states of
>>>> all the nested MVs it uses into two flat lists of source views and source
>>>> tables.  These source tables can't contain storage tables.  When planning
>>>> the refresh job, the producer can choose to use the nested MV's storage
>>>> table or not and the refresh state needs to reflect this decision
>>>> accordingly.
>>>>
>>>> There's also a somewhat corner case to consider.  It is completely
>>>> possible for a source table to show up in a materialization at different
>>>> snapshots.  In this scenario, it's up to the producer to decide whether to
>>>> allow this or not or maybe just record the earliest snapshot.  These
>>>> scenarios are inevitable when you get MVs built on MVs built on MVs such as
>>>> in ETL scenarios.
>>>>
>>>> For max-staleness, I think it should strictly apply to only source
>>>> tables and not storage tables.  For the ETL pipeline use case, the consumer
>>>> is probably not going to care about this property.
>>>>
>>>> Thanks
>>>>
>>>> On Thu, Dec 4, 2025 at 5:37 PM Walaa Eldin Moustafa <
>>>> [email protected]> wrote:
>>>>
>>>>> I think this creates significant friction for engine implementations
>>>>> (and contradicts some of the principles we established earlier):
>>>>>
>>>>> * When the engine sees the tables backing mv_3, the spec provides no
>>>>> built-in way to distinguish MV storage tables from true physical tables.
>>>>> The engine must always perform an external lookup to determine whether a
>>>>> “table” is really an MV.
>>>>>
>>>>> * Freshness evaluation becomes inconsistent: nested logical views
>>>>> require only a one-shot leaf-table comparison, while nested MVs require
>>>>> recursive traversal because their refresh-state does not contain leaf
>>>>> snapshots.
>>>>>
>>>>> * Even if an engine uses the MV definition to detect deeper staleness,
>>>>> it cannot refresh the MV to a consistent base-table state. Option 2 
>>>>> refresh
>>>>> semantics stop at the immediate MV boundary, so recursion for freshness
>>>>> does not align with recursion for refresh.
>>>>>
>>>>> For these reasons, “allowing recursive expansion” is not practically
>>>>> usable. It introduces complexity without providing coherent semantics.
>>>>>
>>>>> In summary, treating MVs either as views or as tables yields a
>>>>> consistent model, but the optionality implied in Option 2 is misleading.
>>>>> The metadata does not support cleanly mixing the two modes.
>>>>>
>>>>> Thanks,
>>>>> Walaa.
>>>>>
>>>>> On Thu, Dec 4, 2025 at 11:44 AM Steven Wu <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Walaa,
>>>>>>
>>>>>> We are saying the `refres-state` only has a source view state for the
>>>>>> source MV and a source table state for the source MV's storage table. It
>>>>>> would allow both evaluation strategies (recursive or not)
>>>>>> * non-recursive: as long as the MV refresh state is aligned with the
>>>>>> source MV's storage table (with max staleness config), it is fresh. This
>>>>>> semantic matches many ETL pipeline use cases.
>>>>>> * recursive: If an engine wants to enforce stronger freshness
>>>>>> semantics, it can recursively evaluate if source mv_1 and mv_2 themselves
>>>>>> are fresh. The current spec wording mentioned this is allowed: "query
>>>>>> engines may recursively expand the query tree to determine freshness
>>>>>> ".
>>>>>>
>>>>>> We wants the spec definition to be flexible enough to support both
>>>>>> use cases.
>>>>>>
>>>>>> Thanks,
>>>>>> Steven
>>>>>>
>>>>>>
>>>>>> On Wed, Dec 3, 2025 at 5:30 PM Walaa Eldin Moustafa <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Steven,
>>>>>>>
>>>>>>> > In option 2, when determining the freshness of mv_3, engines can
>>>>>>> choose to recursively evaluate the freshness of mv_1 and mv_2 since they
>>>>>>> are also MVs. But engines can also choose not to.
>>>>>>>
>>>>>>> Does not "evaluating freshness of mv_1 and mv_2" mean that engines
>>>>>>> consider mv_1 and mv_2 as views? "Tables" do not have freshness.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Walaa.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Nov 25, 2025 at 11:49 PM Jan Kaul via dev <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Thank you Steven,
>>>>>>>>
>>>>>>>> I've included the "max-staleness" in the PR. Please have a look and
>>>>>>>> give feedback on the phrasing.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Jan
>>>>>>>> On 11/19/25 22:37, Steven Wu wrote:
>>>>>>>>
>>>>>>>> Thanks everyone for joining today's sync. We had a good discussion
>>>>>>>> on how to interpret the "max staleness" config.
>>>>>>>>
>>>>>>>> You can find the meeting notes here.
>>>>>>>>
>>>>>>>> https://docs.google.com/document/d/1EVCM-hKr5tY33t0Yzq37cAXSPncySc6Ghke7OZEcqXU/edit?tab=t.0#heading=h.eho7jgm13usg
>>>>>>>>
>>>>>>>> Recording is also linked in the doc (thanks Kevin).
>>>>>>>>
>>>>>>>> For the next step, maybe we can collaborate on the MV spec PR to
>>>>>>>> flush the exact wording for staleness config and semantic.
>>>>>>>> https://github.com/apache/iceberg/pull/11041/files
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Nov 18, 2025 at 1:05 PM Benny Chow <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Igor.  The PR has a suggestion for exactly what you
>>>>>>>>> suggested.  I called it a "*warm*" state which is a state where
>>>>>>>>> stale materialization can still be used.
>>>>>>>>> https://github.com/apache/iceberg/pull/11041/files#r2474661166
>>>>>>>>>
>>>>>>>>> I think if we continue with the assumption that MVs can only
>>>>>>>>> reference iceberg tables and views, then it makes sense for the
>>>>>>>>> max-staleness grace period to be dynamic based on snapshot history.   
>>>>>>>>> This
>>>>>>>>> is what Trino does:
>>>>>>>>> https://trino.io/docs/current/connector/iceberg.html?utm_source=chatgpt.com#materialized-views
>>>>>>>>>
>>>>>>>>> If there are non-Iceberg tables in the view SQL, then the grace
>>>>>>>>> period will have to be based on last refresh which is also what Trino
>>>>>>>>> describes here:
>>>>>>>>> https://trino.io/docs/current/sql/create-materialized-view.html#mv-grace-period
>>>>>>>>>
>>>>>>>>> Should we call out both scenarios in the MV spec?  I think this is
>>>>>>>>> worth being explicit here.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Nov 18, 2025 at 11:03 AM Igor Belianski <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Re:  max-stalenss-ms interpretation
>>>>>>>>>> proposal:
>>>>>>>>>>    A Materialized View(MNV) considered fresh if and only if the
>>>>>>>>>> results stored are equivalent to the those that would have been 
>>>>>>>>>> obtained by
>>>>>>>>>> running MV's defining query at some point in time within interval :
>>>>>>>>>>  [CurrentTime-max-staleness-ms, Current_time]
>>>>>>>>>>
>>>>>>>>>> Note: this definition allows for optimization proposed by option
>>>>>>>>>> 2 (implementing which is definitely a great idea) , but doesn't 
>>>>>>>>>> mandate
>>>>>>>>>> it.
>>>>>>>>>>  One can also imagine some other optimization that would be
>>>>>>>>>> possible given definition above , and would be left up to the 
>>>>>>>>>> engines toi
>>>>>>>>>> implement.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Nov 18, 2025 at 10:54 AM Steven Wu <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> A reminder for tomorrow's community sync for the MV spec.
>>>>>>>>>>> https://calendar.app.google/T4zSk6qKWoy1vV6P7
>>>>>>>>>>>
>>>>>>>>>>> We have one open question from the last meeting on how
>>>>>>>>>>> `max-stalenesss-ms` should be interpreted. You can find more 
>>>>>>>>>>> details in the
>>>>>>>>>>> meeting notes.
>>>>>>>>>>>
>>>>>>>>>>> https://docs.google.com/document/d/1EVCM-hKr5tY33t0Yzq37cAXSPncySc6Ghke7OZEcqXU/edit?tab=t.0#heading=h.75r8e0rwq02o
>>>>>>>>>>>
>>>>>>>>>>> Please also bring other topics that we should discuss.
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Nov 1, 2025 at 10:14 PM Steven Wu <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Sorry for the delay. Here are the recording and meeting notes
>>>>>>>>>>>> for the MV sync meeting on Wednesday, Oct 29.
>>>>>>>>>>>>
>>>>>>>>>>>> https://docs.google.com/document/d/1EVCM-hKr5tY33t0Yzq37cAXSPncySc6Ghke7OZEcqXU/edit?tab=t.0#heading=h.75r8e0rwq02o
>>>>>>>>>>>>
>>>>>>>>>>>> We have started to collect them in the above google doc.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 27, 2025 at 8:58 AM Péter Váry <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> If we have materialized views (MVs) and support for
>>>>>>>>>>>>> incremental change scans, then by introducing a Java-based 
>>>>>>>>>>>>> representation
>>>>>>>>>>>>> of the view, we can expose a scan API that always returns 
>>>>>>>>>>>>> up-to-date
>>>>>>>>>>>>> results for the MV.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The scan could include multiple tasks:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - A task for reading the current version of the MV.
>>>>>>>>>>>>>    - An incremental change log scan covering the range
>>>>>>>>>>>>>    between the snapshot ID of the source table at the time the MV 
>>>>>>>>>>>>> was last
>>>>>>>>>>>>>    refreshed and its current snapshot ID. Applying the Java 
>>>>>>>>>>>>> representation of
>>>>>>>>>>>>>    the view when transformations are required.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This approach allows us to build an always up-to-date index
>>>>>>>>>>>>> table/single source MV, using existing components.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Benny Chow <[email protected]> ezt írta (időpont: 2025. okt.
>>>>>>>>>>>>> 24., P, 7:44):
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Peter
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think the current proposal would support your example.  In
>>>>>>>>>>>>>> most situations, replace table operations after a view is 
>>>>>>>>>>>>>> materialized
>>>>>>>>>>>>>> wouldn’t invalidate the materialization.  However, if the view 
>>>>>>>>>>>>>> includes
>>>>>>>>>>>>>> metadata columns, then the replace operations should invalidate 
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> materialization.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This also brings up another important point that engines will
>>>>>>>>>>>>>> differ on what views can be materialized or not.  For example, 
>>>>>>>>>>>>>> maybe
>>>>>>>>>>>>>> metadata columns are not allowed similar to non deterministic 
>>>>>>>>>>>>>> functions
>>>>>>>>>>>>>> like random.  But some engines like Dremio may allow views that 
>>>>>>>>>>>>>> use current
>>>>>>>>>>>>>> date functions.  It should be possible for one engine to 
>>>>>>>>>>>>>> materialize a view
>>>>>>>>>>>>>> and another engine to look at the query tree and decide it’s not 
>>>>>>>>>>>>>> a view it
>>>>>>>>>>>>>> supports materializations on and choose not to use that 
>>>>>>>>>>>>>> materialization.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> Benny
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Oct 23, 2025, at 8:44 AM, Péter Váry <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I’ve been catching up on the discussion and wanted to share
>>>>>>>>>>>>>> an observation. One aspect that stands out to me in the proposed 
>>>>>>>>>>>>>> staleness
>>>>>>>>>>>>>> evaluation logic is that snapshots which don’t modify data can 
>>>>>>>>>>>>>> still affect
>>>>>>>>>>>>>> the view’s contents if the view includes metadata columns.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I was considering using a materialized view as an index for a
>>>>>>>>>>>>>> given table to accelerate the conversion of equality deletes to 
>>>>>>>>>>>>>> position
>>>>>>>>>>>>>> deletes. For example, the query might look like:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *SELECT _POS, _FILE, id FROM target_table*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> During compaction, the materialized view would need to be
>>>>>>>>>>>>>> refreshed to ensure it reflects the correct data.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does this seem like a valid use case? Or should we explicitly
>>>>>>>>>>>>>> exclude scenarios like this?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Steven Wu <[email protected]> ezt írta (időpont: 2025.
>>>>>>>>>>>>>> okt. 20., H, 17:30):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Walaa,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > while Option 2 is described in your summary as "giving
>>>>>>>>>>>>>>> engines *flexibility* to determine freshness recursively
>>>>>>>>>>>>>>> beyond a source MV", that *isn’t achievable* under the MV
>>>>>>>>>>>>>>> evaluation model itself.
>>>>>>>>>>>>>>> Because each MV treats upstream MVs as physical tables,
>>>>>>>>>>>>>>> recursion stops at the first materialized boundary; *deeper
>>>>>>>>>>>>>>> staleness cannot be discovered without switching to a 
>>>>>>>>>>>>>>> logical-view
>>>>>>>>>>>>>>> evaluation model, i.e., stepping outside the MV model 
>>>>>>>>>>>>>>> altogether (note that
>>>>>>>>>>>>>>> in Option 3 we can determine recursive staleness while still 
>>>>>>>>>>>>>>> inside the MV
>>>>>>>>>>>>>>> model).*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In option 2, when determining the freshness of mv_3, engines
>>>>>>>>>>>>>>> can choose to recursively evaluate the freshness of mv_1 and 
>>>>>>>>>>>>>>> mv_2 since
>>>>>>>>>>>>>>> they are also MVs. But engines can also choose not to.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > This means that there seems to be an implicit “Option 3”.
>>>>>>>>>>>>>>> This option treats MVs as logical views, i.e., storing only 
>>>>>>>>>>>>>>> view versions +
>>>>>>>>>>>>>>> base table snapshot IDs (no MV storage snapshot IDs, no 
>>>>>>>>>>>>>>> per-path lineage).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In the new option 3 you described, how could the engine
>>>>>>>>>>>>>>> update mv3's refresh state for base table_a and table_b? unless 
>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>> connected MVs are refreshed and committed in one single 
>>>>>>>>>>>>>>> transaction, one
>>>>>>>>>>>>>>> entry per base table doesn't seem feasible. That's the main 
>>>>>>>>>>>>>>> reason for
>>>>>>>>>>>>>>> option 1 to require the lineage path information in refresh 
>>>>>>>>>>>>>>> state for base
>>>>>>>>>>>>>>> tables.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It also seems that option 3 can only interpret freshness
>>>>>>>>>>>>>>> recursively, while today there are engines that support MVs 
>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>> recursively evaluating source MVs.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Oct 20, 2025 at 1:44 AM Walaa Eldin Moustafa <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Steven,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for organizing the series and summarizing the
>>>>>>>>>>>>>>>> outcome.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> After re-reading the Option 1/2 proposal, initially I
>>>>>>>>>>>>>>>> interpreted Option 1 as simply expanding MVs like regular 
>>>>>>>>>>>>>>>> logical views. On
>>>>>>>>>>>>>>>> closer look, it is actually more complex. It also preserves 
>>>>>>>>>>>>>>>> per-path
>>>>>>>>>>>>>>>> lineage state (e.g., multiple entries for the same base table 
>>>>>>>>>>>>>>>> via different
>>>>>>>>>>>>>>>> parents), which increases expressiveness but significantly 
>>>>>>>>>>>>>>>> increases
>>>>>>>>>>>>>>>> metadata complexity. So I agree it is not a practical option.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This means that there seems to be an implicit “Option 3”.
>>>>>>>>>>>>>>>> This option treats MVs as logical views, i.e., storing only 
>>>>>>>>>>>>>>>> view versions +
>>>>>>>>>>>>>>>> base table snapshot IDs (no MV storage snapshot IDs, no 
>>>>>>>>>>>>>>>> per-path lineage).
>>>>>>>>>>>>>>>> Under this model, mv_3’s metadata might look like:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Type   Name     Tracked State
>>>>>>>>>>>>>>>> -----  -------  -----------------------
>>>>>>>>>>>>>>>> view   mv_1     view_version_id
>>>>>>>>>>>>>>>> view   mv_2     view_version_id
>>>>>>>>>>>>>>>> table  table_a  table_snapshot_id
>>>>>>>>>>>>>>>> table  table_b  table_snapshot_id
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This preserves logical semantics and aligns MV behavior
>>>>>>>>>>>>>>>> with pure views.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *If we choose Option 2 (treat source MV as a materialized
>>>>>>>>>>>>>>>> table), we may have to be consider those constraints:*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * Staleness only degrades up the chain. mv_1 and mv_2 may
>>>>>>>>>>>>>>>> already be stale relative to the base tables, but if mv_3 is 
>>>>>>>>>>>>>>>> refreshed
>>>>>>>>>>>>>>>> using their storage snapshots, then mv_3 will be marked as 
>>>>>>>>>>>>>>>> fresh under
>>>>>>>>>>>>>>>> Option 2, even though all three MVs are stale relative to the 
>>>>>>>>>>>>>>>> base tables.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * Engines can no longer discover staleness beyond mv_1.
>>>>>>>>>>>>>>>> Once mv_3 sees mv_1 (or mv_2) as fresh based only on their 
>>>>>>>>>>>>>>>> storage
>>>>>>>>>>>>>>>> snapshots, it will not expand into mv_1 or mv_2 to check 
>>>>>>>>>>>>>>>> whether they are
>>>>>>>>>>>>>>>> stale relative to the base tables.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * If mv_2 and mv_3 were purely logical views instead of
>>>>>>>>>>>>>>>> MVs, they would evaluate directly against base tables and 
>>>>>>>>>>>>>>>> return newer
>>>>>>>>>>>>>>>> data. Under Option 2, the same definitions but materialized 
>>>>>>>>>>>>>>>> upstream
>>>>>>>>>>>>>>>> produce different data, not just different metadata.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Therefore, while Option 2 is described in your summary as
>>>>>>>>>>>>>>>> "giving engines *flexibility* to determine freshness
>>>>>>>>>>>>>>>> recursively beyond a source MV", that *isn’t achievable*
>>>>>>>>>>>>>>>> under the MV evaluation model itself.
>>>>>>>>>>>>>>>> Because each MV treats upstream MVs as physical tables,
>>>>>>>>>>>>>>>> recursion stops at the first materialized boundary; *deeper
>>>>>>>>>>>>>>>> staleness cannot be discovered without switching to a 
>>>>>>>>>>>>>>>> logical-view
>>>>>>>>>>>>>>>> evaluation model, i.e., stepping outside the MV model 
>>>>>>>>>>>>>>>> altogether (note that
>>>>>>>>>>>>>>>> in Option 3 we can determine recursive staleness while still 
>>>>>>>>>>>>>>>> inside the MV
>>>>>>>>>>>>>>>> model).*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Let me know your thoughts. I slightly prefer Option 3. I’m
>>>>>>>>>>>>>>>> also fine with Option 2, but I don’t think the flexibility to 
>>>>>>>>>>>>>>>> recursively
>>>>>>>>>>>>>>>> determine freshness actually exists under its evaluation 
>>>>>>>>>>>>>>>> model. Not sure if
>>>>>>>>>>>>>>>> this changes anyone’s view, but I wanted to clarify how I’m 
>>>>>>>>>>>>>>>> reading it.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Oct 8, 2025 at 11:11 PM Benny Chow <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I just listened to the recording.  I'm the tech lead for
>>>>>>>>>>>>>>>>> MVs at Dremio and responsible for both refresh management and 
>>>>>>>>>>>>>>>>> query
>>>>>>>>>>>>>>>>> rewrites with MVs.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It's great that we seem to agree that Iceberg MV spec
>>>>>>>>>>>>>>>>> won't require that MVs always be up to date in order to be 
>>>>>>>>>>>>>>>>> usable for query
>>>>>>>>>>>>>>>>> rewrites.  There can be many data consistency issues (as Dan 
>>>>>>>>>>>>>>>>> pointed out)
>>>>>>>>>>>>>>>>> but that is the state of affairs today.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It sounds like we are converging on the following
>>>>>>>>>>>>>>>>> scenarios for an engine to validate the MV freshness:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1.  Use storage table without any validation.  This might
>>>>>>>>>>>>>>>>> be the extreme "async MV" example.
>>>>>>>>>>>>>>>>> 2.  Ignore storage table even if one exists because SQL
>>>>>>>>>>>>>>>>> command or use case requires that.
>>>>>>>>>>>>>>>>> 3.  Use storage table only if data is not more than x
>>>>>>>>>>>>>>>>> hours old.  This can be achieved with the proposed 
>>>>>>>>>>>>>>>>> refresh-start-timestamp-ms
>>>>>>>>>>>>>>>>> which is currently in the proposed spec.  For this to work
>>>>>>>>>>>>>>>>> with MVs built on MVs, we should probably state in the spec 
>>>>>>>>>>>>>>>>> that if a MV is
>>>>>>>>>>>>>>>>> built on another MV, then it needs to inherit the
>>>>>>>>>>>>>>>>> refresh-start-timestamp-ms of the child MV.  In Steven's 
>>>>>>>>>>>>>>>>> example, when
>>>>>>>>>>>>>>>>> building mv3, refresh-start-timestamp-ms needs to be set to 
>>>>>>>>>>>>>>>>> the minimum of
>>>>>>>>>>>>>>>>> mv1 or mv2's refresh-start-timestamp-ms.  If this property 
>>>>>>>>>>>>>>>>> name is
>>>>>>>>>>>>>>>>> confusing, we can rename it to 
>>>>>>>>>>>>>>>>> "refresh-earliest-table-timestamp-ms".  I
>>>>>>>>>>>>>>>>> originally proposed this property and also listed out other 
>>>>>>>>>>>>>>>>> benefits here:
>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/11041#discussion_r1779797796
>>>>>>>>>>>>>>>>> Also, at the time, MVs built on MVs weren't being considered. 
>>>>>>>>>>>>>>>>>  Now that it
>>>>>>>>>>>>>>>>> is, I would recommend we have both 
>>>>>>>>>>>>>>>>> "refresh-start-timestamp-ms" (when the
>>>>>>>>>>>>>>>>> refresh was started on the storage table) and
>>>>>>>>>>>>>>>>> "refresh-earliest-table-timestamp-ms" (used for freshness 
>>>>>>>>>>>>>>>>> validation).
>>>>>>>>>>>>>>>>> 4.  Don't use the storage table if it is older than X
>>>>>>>>>>>>>>>>> hours.  This is what I had originally proposed for the
>>>>>>>>>>>>>>>>> *materialization.max-stalessness-ms* view property here:
>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/11041#discussion_r1744837644
>>>>>>>>>>>>>>>>> It wasn't meant to validate the freshness but more to prevent 
>>>>>>>>>>>>>>>>> use of a
>>>>>>>>>>>>>>>>> materialization after some criteria.
>>>>>>>>>>>>>>>>> 5.  Use storage table if recursive validation passes...
>>>>>>>>>>>>>>>>> i.e. refresh-state matches the current expanded query tree 
>>>>>>>>>>>>>>>>> state.  This is
>>>>>>>>>>>>>>>>> what I think Steven is calling the "synchronous MV".
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For scenario 1-4, it would support the nice use case of an
>>>>>>>>>>>>>>>>> Iceberg client using a view's data through the storage table 
>>>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>> needing to know how to parse/validate/expand any view SQLs.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In Dremio's planner, we primarily use scenario 1 and 4
>>>>>>>>>>>>>>>>> together to determine MV validity for query rewrite.  
>>>>>>>>>>>>>>>>> Scenario 2 and 5 also
>>>>>>>>>>>>>>>>> apply in certain situations.  For scenario 3, Dremio only 
>>>>>>>>>>>>>>>>> exposes the
>>>>>>>>>>>>>>>>> "refresh-earliest-table-timestamp-ms" as an fyi to the user 
>>>>>>>>>>>>>>>>> but it would be
>>>>>>>>>>>>>>>>> interesting to allow the user to set this time so that they 
>>>>>>>>>>>>>>>>> could run
>>>>>>>>>>>>>>>>> queries and be 100% certain that they were not seeing data 
>>>>>>>>>>>>>>>>> older than x
>>>>>>>>>>>>>>>>> hours.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>> Benny
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Oct 8, 2025 at 3:37 PM Steven Wu <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> correction for a typo.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Prashanth brought up another scenario of
>>>>>>>>>>>>>>>>>> compaction/rewrite where a new snapshot was added *with*
>>>>>>>>>>>>>>>>>> actual data change
>>>>>>>>>>>>>>>>>> -->
>>>>>>>>>>>>>>>>>> Prashanth brought up another scenario of
>>>>>>>>>>>>>>>>>> compaction/rewrite where a new snapshot was added
>>>>>>>>>>>>>>>>>> *without* actual data change
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Oct 8, 2025 at 2:12 PM Steven Wu <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks everyone for joining the MV discussion meeting.
>>>>>>>>>>>>>>>>>>> We will continue to have the recurring sync meeting on 
>>>>>>>>>>>>>>>>>>> Wednesday 9 am
>>>>>>>>>>>>>>>>>>> (Pacific) every 3 weeks until we get to the finish line 
>>>>>>>>>>>>>>>>>>> where Jan's MV spec
>>>>>>>>>>>>>>>>>>> PR [1] is merged. I have scheduled our next meeting on Oct 
>>>>>>>>>>>>>>>>>>> 29 in the
>>>>>>>>>>>>>>>>>>> Iceberg dev events calendar.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Here is the video recording for today's meeting.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> https://drive.google.com/file/d/1-nfhBPDWLoAFDu5cKP0rwLd_30HB6byR/view?usp=sharing
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We mostly discussed freshness evaluation. Here is the
>>>>>>>>>>>>>>>>>>> meeting summary.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    1. For tracking the refresh state for the source MV
>>>>>>>>>>>>>>>>>>>    [2], the consensus is option 2 (treating source MV as a 
>>>>>>>>>>>>>>>>>>> materialized table)
>>>>>>>>>>>>>>>>>>>    which would give engines the flexibility on freshness 
>>>>>>>>>>>>>>>>>>> determination
>>>>>>>>>>>>>>>>>>>    (recursive beyond source MV or not).
>>>>>>>>>>>>>>>>>>>    2. Earlier design doc [3] discussed max staleness
>>>>>>>>>>>>>>>>>>>    config. But it wasn't reflected in the spec PR. The 
>>>>>>>>>>>>>>>>>>> general opinion is to
>>>>>>>>>>>>>>>>>>>    add the config to the spec PR. The open question is 
>>>>>>>>>>>>>>>>>>> whether the `
>>>>>>>>>>>>>>>>>>>    materialization.max-staleness-ms` config should be
>>>>>>>>>>>>>>>>>>>    added to the view metadata or the storage table 
>>>>>>>>>>>>>>>>>>> metadata. Either can work.
>>>>>>>>>>>>>>>>>>>    We just need to decide which makes a little better fit.
>>>>>>>>>>>>>>>>>>>    3. Prashanth brought up schema change with default
>>>>>>>>>>>>>>>>>>>    value and how it may affect the MV refresh state (for 
>>>>>>>>>>>>>>>>>>> SQL representation
>>>>>>>>>>>>>>>>>>>    with select *). Jan mentioned that snapshot contains 
>>>>>>>>>>>>>>>>>>> schema id when the
>>>>>>>>>>>>>>>>>>>    snapshot was created. Engine can compare the snapshot 
>>>>>>>>>>>>>>>>>>> schema id to the
>>>>>>>>>>>>>>>>>>>    source table schema id during freshness evaluation. 
>>>>>>>>>>>>>>>>>>> There is no need for
>>>>>>>>>>>>>>>>>>>    additional schema info in refresh-state tracking in the 
>>>>>>>>>>>>>>>>>>> storage table.
>>>>>>>>>>>>>>>>>>>    4. Prashanth brought up another scenario of
>>>>>>>>>>>>>>>>>>>    compaction/rewrite where a new snapshot was added with 
>>>>>>>>>>>>>>>>>>> actual data change.
>>>>>>>>>>>>>>>>>>>    The general take is that the engine can optimize and 
>>>>>>>>>>>>>>>>>>> decide that MV is
>>>>>>>>>>>>>>>>>>>    fresh as the new snapshot doesn't have any data change.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We can add some clarifications in the spec PR for
>>>>>>>>>>>>>>>>>>> freshness evaluation based on the above discussions.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> [1] https://github.com/apache/iceberg/pull/11041
>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1_StBW5hCQhumhIvgbdsHjyW0ED3dWMkjtNzyPp9Sfr8/edit?tab=t.0
>>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?tab=t.0#heading=h.3wigecex0zls
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Sep 25, 2025 at 9:27 AM Steven Wu <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Iceberg materialized view has been discussed in the
>>>>>>>>>>>>>>>>>>>> community for a long time. Thanks Jan Kaul for driving the 
>>>>>>>>>>>>>>>>>>>> discussion and
>>>>>>>>>>>>>>>>>>>> the spec PR. It has been stalled for a long time due to 
>>>>>>>>>>>>>>>>>>>> lack of consensus
>>>>>>>>>>>>>>>>>>>> on 1 or 2 topics. In Wed's Iceberg community sync meeting, 
>>>>>>>>>>>>>>>>>>>> Talat brought up
>>>>>>>>>>>>>>>>>>>> the question on how to move forward and if we can have a 
>>>>>>>>>>>>>>>>>>>> dedicated meeting
>>>>>>>>>>>>>>>>>>>> for MV.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I have set up a meeting on *Oct 8 (9-10 am Pacific)*.
>>>>>>>>>>>>>>>>>>>> If you subscribe to the "Iceberg Dev Events" calendar,
>>>>>>>>>>>>>>>>>>>> you should be able to see it. If not, here is the link:
>>>>>>>>>>>>>>>>>>>> https://meet.google.com/nfe-guyq-pqf
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> We are going to discuss
>>>>>>>>>>>>>>>>>>>> * remaining open questions
>>>>>>>>>>>>>>>>>>>> * unresolved concerns
>>>>>>>>>>>>>>>>>>>> * the next step and hopefully some consensus on moving
>>>>>>>>>>>>>>>>>>>> forward
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> MV spec PR is up to date. Jan has incorporated recent
>>>>>>>>>>>>>>>>>>>> feedback. This should be the base of the discussion.
>>>>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/11041
>>>>>>>>>>>>>>>>>>>> <https://www.google.com/url?q=https://github.com/apache/iceberg/pull/11041&sa=D&source=calendar&usd=2&usg=AOvVaw3w0TjRpwbC17AGzmxZmElM>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Dev discussion thread (a long-running thread started by
>>>>>>>>>>>>>>>>>>>> Jan).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> https://lists.apache.org/thread/y1vlpzbn2x7xookjkffcl08zzyofk5hf
>>>>>>>>>>>>>>>>>>>> <https://www.google.com/url?q=https://lists.apache.org/thread/y1vlpzbn2x7xookjkffcl08zzyofk5hf&sa=D&source=calendar&usd=2&usg=AOvVaw0fotlsrnRBOb820mA5JRyB>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The mail archive has broken lineage and doesn't show
>>>>>>>>>>>>>>>>>>>> all replies. Email subject is "*[DISCUSS] Iceberg
>>>>>>>>>>>>>>>>>>>> Materialzied Views*".
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>

Re: Dedicated sync for Iceberg materialized view

Reply via email to