Re: Iceberg MV Refresh

2024-06-24 Thread Benny Chow
Thanks Piotr. I agree with both points. I added a doc comment to clarify both the description and name for this property. Hopefully, we're all in sync now: https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?disco=AAABFwRPGoA On Mon, Jun 24, 2024 at 4:58 AM P

Re: Iceberg MV Refresh

2024-06-24 Thread Piotr Findeisen
Hi, For the MV to be useful, the grace period (max staleness) should be part of materialized view definition. Ultimately it's the query engine responsibility to implement grace period behavior correctly, but the engine needs to know what amount of staleness is OK for this particular view and that's

Re: Iceberg MV Refresh

2024-06-21 Thread Benny Chow
Hi Dan, looks like it is pretty common across engines and sometimes part of the engine specific DDL operation to create the MV. So, I agree let's keep the "*materialization.max-staleness*" property. I'll also point out that when the clock starts for this max staleness check could be either when t

Re: Iceberg MV Refresh

2024-06-21 Thread Daniel Weeks
Benny, I think you bring up a good point about staleness in that different clients may want different behaviors. However, defining a "grace period" or "max staleness" is pretty common and makes a lot of sense when working with expensive queries. Trino

Re: Iceberg MV Refresh

2024-06-20 Thread Jan Kaul
Thanks Benny for bringing these issues up. I would agree with both of your propositions. Regarding the naming of the fields, we can go with the naming that you suggested. I just wanted to wait if some more people chime in with their opinions. Jan On 20.06.24 23:16, Benny Chow wrote: > So ba

Re: Iceberg MV Refresh

2024-06-20 Thread Benny Chow
> So basically this is just FYI and it is up to the consumer to assume what to do given the length of time between that timestamp and now? Yes.. some consumers will use it.. some won't. Only the engine producing the materialization will know when it started the refresh job. > Would the decision

Re: Iceberg MV Refresh

2024-06-20 Thread Walaa Eldin Moustafa
So basically this is just FYI and it is up to the consumer to assume what to do given the length of time between that timestamp and now? Would the decision be on the engine implementation side (which does not have the domain knowledge), or would it be on the user's side? In the latter case the user

Re: Iceberg MV Refresh

2024-06-20 Thread Benny Chow
Piotr, thanks for the Trino pointers. I noticed that Trino stores the refresh start time as a snapshot summary property here . I thi

Re: Iceberg MV Refresh

2024-06-20 Thread Walaa Eldin Moustafa
Benny, is the suggestion to couple the "refresh-start-timestamp-ms" property with a grace period as well? Also, could you clarify which timestamp "refresh-start-timestamp-ms" refers to: (1) Timestamp when refresh is triggered (2) Timestamp when refresh is concluded and the snapshot is written. Als

Re: Iceberg MV Refresh

2024-06-20 Thread Piotr Findeisen
Hi Benny, on the staleness topic I'd recommend to check how Trino implements materialized views in Iceberg and how it defines staleness. In particular - a view can have defined grace period which defines how stale the data can be for the materialization to be considered useful (defaults to unlimi

Iceberg MV Refresh

2024-06-19 Thread Benny Chow
Hey Guys, Great progress on the MV spec and thanks a ton to Jan and Walaa for driving this. One of our latest achievements was that we finalized the view lineage and materialization table refresh JSON so that we can definitively and concisely describe what data is in the materialization table. R