Looks good Jan. I'm a bit nit pick on picking good names so I left some comments around that to see what others think.ThanksOn Fri, Jun 7, 2024 at 2:26 AM Jan Kaul <jankaul@mailbox.org.invalid> wrote:Thanks Benny and Walaa for your input. I updated the doc to account for the changes as far as I understood. I would appreciate if you had a look and give me some feedback.
If you have some open comments that are not relevant anymore due to the changes, please close them so that we can clean up the comments section a bit.
Regards,Jan
On 07.06.24 08:33, Walaa Eldin Moustafa wrote:
*lineagestate JSON structure
On Thu, Jun 6, 2024 at 11:31 PM Walaa Eldin Moustafa <wa.moustafa@gmail.com> wrote:
Hi Benny,
Your understanding is correct.
Another point that we discussed was the type of APIs engines can use to conveniently update the storage table with view query results as well as set the snapshot summary on the output snapshot (one that was produced by the update). We will follow up on that separately.
Jan, do you want to reflect the lineage + state discussion in the doc so we can iterate on the lineage JSON structure?
Thanks,Walaa.
On Thu, Jun 6, 2024 at 9:40 PM Benny Chow <btchow@gmail.com> wrote:
I really enjoyed listening to the replay and hearing everyone's feedback! I'm in agreement with all 3 consensus items, especially around Dan's idea to separate the view's query tree lineage vs materialization's lineage state.
I'll summarize my understanding about the distinction and add a few comments:
Materialized View's Query Tree Lineage- It's basically the SQL representation converted to a distinct list of tables and views.- Stored inside view versions so if you change the view SQL, you can include the lineage with that change.- Tables support time travel so they can optionally include a ref type and name/timestamp- Views would NOT include the version (that's part of the materialization lineage state below)- I think we should use fully qualified identifiers here instead of UUIDs. Dropping and re-creating a referenced table or view doesn't break the view SQL so the lineage should not be broken either. I also don't think we can support time travel if we used table UUIDs here.- Each table or view can be assigned a unique sequence number. This sequence number is scoped to a single view version.
Materialization Lineage State- It's basically a lookup table for the above sequence number to either a table snapshot id or view version that was used at the time of creating/refreshing the storage table. For views, these are nested views within the MV's query tree - not the MV itself.- Stored inside the table's snapshot summary- Additional property "refresh-version-id" to identify the MV's version.
In order to validate the freshness of a materialization, everything above has to be checked against the latest tables and views. This should cover all data and query tree changes (that I can think of) such as the "limit 100" example I gave in Slack.
Please let me know your thoughts.
Thanks
On Thu, Jun 6, 2024 at 7:53 AM <russell.spitzer@gmail.com> wrote:
Thanks for hosting it was a very helpful meeting. I really hope we can do more in the future to accelerate consensus on other proposals.
I do encourage anyone on the mailing list to add your comments offline as well, especially if you have strong feelings. Iceberg is an open project and we realize not everyone can attend virtual meetings and want you to know you are welcome.
On Jun 6, 2024, at 7:11 AM, Jan Kaul <jankaul@mailbox.org.invalid> wrote:
Hi all,
thanks to all of you who attended the meeting yesterday! It was great to talk to you and I think we made great progress. For those of you who weren't able to attend the meeting, I summarized the main points below:
Question 1: Should we store the "storage table pointer" as a view property or as additional field in the view metadata?
We reached consensus to add a new metadata field "storage-table" to the view version record that stores the identifier of the the storage table. The motivation for introducing a new field is that this emphasizes that materialized views are part of the standard and it enforces a common behavior.
Question 2: Where should the lineage-state information be stored?
We reached consensus on storing the lineage-state information in the snapshot summary of the storage table. The motivation behind this is that the table spec should not be concerned with defining view constructs.
Question 3: How should the lineage-state information be represented?We reached consensus on representing the lineage-state in the form of nested objects and storing these as a JSON-encoded string inside the storage table snapshot summary.
Additionally, Dan proposed to introduce a new lineage construct as part of the view definition in addition to the lineage-state that is part of the storage table. The idea is to separate the concerns. The lineage-state in the storage table should only capture the state of the source tables at the time of the last refresh, whereas the lineage information in the view contains more information about the source tables and is responsible for resolving the identifiers. We haven't really decided on how the new lineage construct should be represented or integrated into the view metadata.
One point that we didn't really have the time to discuss was Benny's comment of also storing the version-id of views in the case that the materialized view is referencing a view. I think we should also integrate that into the spec.
You can find the recording of the meeting here:
https://drive.google.com/file/d/1DE09tYS28L3xL_NgnM9g0Olbe6aHza5G/view?usp=sharing
Best wishes,
Jan
No that's great, thank you. I'm thankful for the input.
Jan
Am 07.06.2024 17:53 schrieb Benny Chow <btc...@gmail.com>:
- Summary of Iceberg Materialized View Meeting Jan Kaul
- Re: Summary of Iceberg Materialized View Meeting russell . spitzer
- Re: Summary of Iceberg Materialized View Mee... Benny Chow
- Re: Summary of Iceberg Materialized View... Walaa Eldin Moustafa
- Re: Summary of Iceberg Materialized ... Walaa Eldin Moustafa
- Re: Summary of Iceberg Material... Jan Kaul
- Re: Summary of Iceberg Mate... Benny Chow
- Re: Summary of Iceberg Materialized View Meeting Jan Kaul
- Re: Summary of Iceberg Materialized View Mee... Walaa Eldin Moustafa