I'm generally in favor of approach #1 with UUID in lineage.I think it's helpful to know if the underlying table changes (e.g. identifier remains the same, but the table was changed). However, I'm not sure what the behavior would be in that case. Any refresh at that point would not be able to produce a match between the lineage information and the state (without updating the view lineage information).If we want to support operations like RTAS or Drop/Create on tables referenced by a view, #1 might not work. My understanding is that most database systems wouldn't allow that, but it could also be hard to enforce without a more complicated catalog implementation.-DanOn Tue, Sep 3, 2024 at 11:21 AM Walaa Eldin Moustafa <wa.moustafa@gmail.com> wrote:Hi Steven,Thanks! I see the PR now introduces UUID in the state information. That is great progress. I still have slight preference on where to place UUID (in lineage vs state), which I summarized in this doc [1] as promised. I wrote the doc before UUID was added in the PR, so it still compares with the approach that does not leverage UUIDs at all. So in summary the doc compares between three approaches:(1) With UUID in lineage.(2) With UUID in state (current approach in the PR).(3) Without UUID at all (previous state before the recent PR update).I have slight preference for (1) as I think it cleaner and useful for future applications (as outlined in the doc).In regards to how to proceed, I think technically we will need a vote. Timing of vote depends on when the contributor creating the vote is comfortable with starting it.Thanks,Walaa.On Tue, Sep 3, 2024 at 10:40 AM Steven Wu <stevenz3wu@gmail.com> wrote:Walaa, for the listed discussion points, how should we move forward? should we have another MV sync meeting?BTW, Jan's latest spec PR addressed my comment on UUID.On Mon, Sep 2, 2024 at 4:35 PM Walaa Eldin Moustafa <wa.moustafa@gmail.com> wrote:Hi Jan,I think we need further discussion for a few reasons:* Semantic issues are significant, especially in a spec. They could create gaps that are very hard to fix in the future. I would rather spend more time designing them properly than fix the gaps with band-aid solutions in the future.* Beyond the arguments based on semantics, there are technical differences between SQL/catalog table identifier and UUID. We need to dive into them and understand which construct is best suited for which use cases.* After we have asked for further feedback in the previous thread, Steven's suggestion incorporated UUIDs [1] (not in the key, but in the value, so we should consider that too, and understand the difference between using them in the value or in the key).* Ryan's suggestion [2] on the MV issue contained UUID in the value (so same like the above).* The design before the MV community sync up already used the table SQL/catalog identifiers as the only way to reference the child tables/views in the storage table. During the meeting we agreed to split up this info. As a follow-up the split used sequence numbers (so we did go with the split, and not using SQL identifiers directly), but we have shown that the split using sequence IDs does not work [3].Thanks,Walaa.On Mon, Sep 2, 2024 at 3:24 AM Jan Kaul <jankaul@mailbox.org.invalid> wrote:Hi Walaa,
in my opinion the UUID vs table identifier discussion comes down to the question: do we introduce table identifiers in an opaque struct in the table snapshot summary or do we accept technical drawbacks?
Using UUIDs with a lineage struct in the view metadata leads to one of the following drawbacks:
- If we don't fully expand the query tree in the lineage, we have to expand the lineage on every read to determine freshness
- If we fully expand the query tree in the lineage, the view version must be updated every time a child view changes
With table identifiers we don't have any technical drawbacks. The fact that the query tree has to be expanded for the refresh operation is not a drawback, as this also has to be done for UUIDs with a not fully expanded query tree.
Again, the question comes down to: do the technical drawbacks outweigh introducing table identifiers in the table snapshot summary?
In my opinion yes. And if I understand correctly, most of the other people participating in the discussion have a similar stance.I'm not sure if we really need another document for the discussion.
Best wishes,
Jan
On 29.08.24 21:10, Walaa Eldin Moustafa wrote:
Hi Jan,
I think we need to close the discussion on the UUID vs table identifier options and possibly cast a vote before having a productive discussion on the PR. I did not get a chance yet to post the document on the UUID vs table identifier discussion. I will do that by next week.
Thanks,Walaa.
On Thu, Aug 29, 2024 at 5:31 AM Jan Kaul <jankaul@mailbox.org.invalid> wrote:
Hi all,
to move the Iceberg Materialzied View Proposal forward, I created a PR
(https://github.com/apache/iceberg/pull/11041) that adds a section on
Materialized Views to the View Spec. I hope we can resolve any remaining
questions there, before we can start the voting process for the Proposal.
It would be great if you could have a look.
Best wishes,
Jan
Hi all,
I would like to mention that Walaa's document doesn't mention any downsides of Approach #1 in the comparison.
Approach #1 has downsides when views are referenced in the materialized view query. These problems arise because only direct children are stored in the lineage. This leads to:
- The lineage of child views has to be expanded everytime the freshness of the materialized views is to be determined. Resulting in recursive catalog calls at read time
- the lineage field has to be present in child views, otherwise it's not possible to obtain the table identifiers.
Thanks,
Jan
Am 04.09.2024 06:28 schrieb Daniel Weeks <dwe...@apache.org>:
- [DISCUSS] Iceberg Materialzied Views Jan Kaul
- Re: [DISCUSS] Iceberg Materialzied Views Walaa Eldin Moustafa
- Re: [DISCUSS] Iceberg Materialzied Views Jan Kaul
- Re: [DISCUSS] Iceberg Materialzied Views Walaa Eldin Moustafa
- Re: [DISCUSS] Iceberg Materialzied V... Steven Wu
- Re: [DISCUSS] Iceberg Materialz... Walaa Eldin Moustafa
- Re: [DISCUSS] Iceberg Mater... Daniel Weeks
- Re: [DISCUSS] Iceberg Materialzied Views Jan Kaul
- Re: [DISCUSS] Iceberg Materialzied Views Walaa Eldin Moustafa
- Re: [DISCUSS] Iceberg Materialzied Views Benny Chow
- Re: [DISCUSS] Iceberg Materialzied V... Benny Chow
- Re: [DISCUSS] Iceberg Materialz... Jan Kaul
- Re: [DISCUSS] Iceberg Mater... Benny Chow
- Re: [DISCUSS] Iceberg M... Jan Kaul
- Re: [DISCUSS] Iceberg M... Walaa Eldin Moustafa
- Re: [DISCUSS] Iceberg M... Jan Kaul
- Re: [DISCUSS] Iceberg M... Benny Chow
- Re: [DISCUSS] Iceberg M... Jan Kaul