Re: Materialized Views: Next Steps

2024-05-20 Thread Daniel Weeks
Walaa, I think Ryan's comment was more in relation to the combined metadata approach (which is not the current proposal). I don't think anything that's being discussed at this point helps or hurts the view integration story with catalog apis. -Dan On Mon, May 20, 2024 at 1:21 PM Walaa Eldin Mo

Re: Materialized Views: Next Steps

2024-05-20 Thread Daniel Weeks
I know I'm coming in late here, but I'm still working through all the prior discussion. Here are my thoughts so far: I agree that Jan's doc has some really good context and we should continue from there, but we should remove discussion of options as it just creates confusion. We can reference ot

Re: Materialized Views: Next Steps

2024-05-20 Thread Jack Ye
Sorry I digressed the discussion a bit. For time travel related design I think we can move to https://github.com/apache/iceberg/pull/10280#discussion_r1596091824 to discuss further. My point of describing that was more to say it seems sufficient to use properties to describe those time travel cases

Re: Materialized Views: Next Steps

2024-05-17 Thread Walaa Eldin Moustafa
I am not in favor of expanding the spec for use cases that do not directly serve materialized views. Identifying general lineage is a separate problem that is also applicable to non-materialized views so maybe that’s worth discussing in a separate spec. If there is a use case for timestamp or snaps

Re: Materialized Views: Next Steps

2024-05-17 Thread Benny Chow
I think it’s still worthwhile to include the snapshot and timestamp refs for completeness sake. Also, Jan brought up interesting use case with BI tool using the MV without SQL representation.  The BI tool can get all table and view dependencies if the lineage is complete. ThanksOn May 17, 2024, at

Re: Materialized Views: Next Steps

2024-05-16 Thread Benny Chow
Sounds good. Another benefit of the struct model is that it's more extensible in the future when we need to disambiguate the same table that appears multiple times in the MV query tree. This could happen with time travel queries or branching. We may end up adding additional properties like a sequ

Re: Materialized Views: Next Steps

2024-05-16 Thread Walaa Eldin Moustafa
Hi Benny, I have responded to the comment. I would suggest that we use this thread to evaluate properties model vs top level metadata model (to avoid discussion drift). If we have feedback on the actual properties used in the properties model as defined in the PR, we can have the discussion there

Re: Materialized Views: Next Steps

2024-05-16 Thread Benny Chow
Hi Walaa I left comments in your spec PR: https://github.com/apache/iceberg/pull/10280#pullrequestreview-2061922169 My last question about use cases was really about incremental refresh with aggregates. But I think this might be too complicated to try to model/discuss now and so I agree with Mic

Re: Materialized Views: Next Steps

2024-05-14 Thread Walaa Eldin Moustafa
Thanks Benny. My specific thoughts about the spec and the properties are captured in the spec PR https://github.com/apache/iceberg/pull/10280. The spec is also implemented in the Spark implementation PR https://github.com/apache/iceberg/pull/9830, and I believe this follows the same nature of how t

Re: Materialized Views: Next Steps

2024-05-14 Thread Benny Chow
I agree with Szheon here.  I think storing the materialization lineage as a bunch of properties is brittle.  This lineage information is needed by engines to validate the staleness of a materialization and also to perform full or incremental refreshes.  There’s a lot to capture here. Maybe we shoul

Re: Materialized Views: Next Steps

2024-05-14 Thread Walaa Eldin Moustafa
Thanks John. The current metadata does not sound complex. We need to track the underlying table snapshot IDs as well as the view version ID. I agree as long as it is simple and before this feature fully matures, we should track it in properties. One important factor for me (apart from the API effo

Re: Materialized Views: Next Steps

2024-05-14 Thread John Zhuge
Hi Szheon, While I fully share your concern of abusing table properties, we took the approach of option 1 and run it in production for several years: - the feature was still evolving - quick and simple implementation - table properties are simple enough and not confusing - haven't see

Re: Materialized Views: Next Steps

2024-05-10 Thread Walaa Eldin Moustafa
Hi Szheon, Thanks for the follow-up. It is possible some of the concerns were referring to the backend catalogs, but it is all connected. My main personal concern is from the engine connector APIs point of view, but I share the concern about the catalogs. I think everyone's concern is not about t

Re: Materialized Views: Next Steps

2024-05-10 Thread Szehon Ho
Hi Walaa OK thanks for confirming. I am still not 100% in agreement, my understanding of the rationale for separate Table/View objects in the comment that you linked: I think the biggest problem with this is that we would need to modify every > catalog to support this combination and that would

Re: Materialized Views: Next Steps

2024-05-09 Thread Walaa Eldin Moustafa
Hi Szehon, Yes, you are reading the PR correctly, and interpreting the meaning of properties correctly. I think the reply you pasted from Ryan refers to the same concept as well. For the initial Google doc and the issue (by the way it is an issue, not a PR), yes both are proposing new metadata fi

Re: Materialized Views: Next Steps

2024-05-09 Thread Szehon Ho
Hi Walaa As there may be confusion in the word 'properties', I want to double check if we are talking about the same thing here. I am reading your PR as adding lineage metadata as new key/value pair under the storage Table's 'properties' field: https://github.com/apache/iceberg/blob/main/format/s

Re: Materialized Views: Next Steps

2024-05-09 Thread Walaa Eldin Moustafa
Hi Szehon, I think choosing separate view + table objects precludes us from adding new metadata to table and view metadata. Here is one relevant comment [1] from Ryan on the modeling doc, where his point is that we want to avoid introducing new APIs since it requires updating every catalog, and (q

Re: Materialized Views: Next Steps

2024-05-09 Thread Szehon Ho
Hi Walaa, I agree, I definitely do not want yet another pr/doc where discussion happens. as its already quite spread out :) But did not want to clarify some points before we get started on the discussion on your PR. With reusing the table and view objects, we are not changing the existing > meta

Re: Materialized Views: Next Steps

2024-05-09 Thread Walaa Eldin Moustafa
Thanks Szehon. The reason for the difference is that the proposal in the Google doc is based on a new MV model, hence, new metadata fields and a new metadata model were being introduced (with types, optionality, etc). With reusing the table and view objects, we are not changing the existing metada

Re: Materialized Views: Next Steps

2024-05-09 Thread Szehon Ho
Thanks Walaa for driving it forward, looking forward to thinking about implementation of Materialized Views. I see Jan's point, the PR spec change is similar but does not seem to be completely aligned with the Draft Spec in the design doc: https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6

Re: Materialized Views: Next Steps

2024-05-08 Thread Jan Kaul
Well, everybody that actively contributed to the discussion on the original google doc was in consensus. That's why I brought up the topic at the Community Sync on the 2024-02-14 (https://youtu.be/uAQVGd5zV4I?t=890) to raise the awareness of the broader community. After which the discussion abo

Re: Materialized Views: Next Steps

2024-05-08 Thread Walaa Eldin Moustafa
The only consensus the community had was on the object model through the most recent voting thread [1]. This kind of consensus was not present during the doc discussions, and this should be evident from the fact the last doc state listed 5 alternatives with no particular conclusion. I am not quite

Re: Materialized Views: Next Steps

2024-05-08 Thread Jan Kaul
The original google doc discussed multiple aspects of the Materialized View spec. One was the storage model while others were related to the metadata. After we (Micah, Szehon, you, me) reached con

Re: Materialized Views: Next Steps

2024-05-08 Thread Walaa Eldin Moustafa
Thanks Jan. I think we moved on to more alignment steps beyond that doc a while ago. After that doc, we have discussed the topic further in 2 dev list threads and one more doc (with strictly two options for

Re: Materialized Views: Next Steps

2024-05-08 Thread Jan Kaul
Thanks Walaa for trying to move things along. However I don't think it's a good idea to start a separate discussion about the metadata for materialized views because we already had this discussion and reached consensus in this google doc: https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwX

Re: Materialized Views: Next Steps

2024-05-07 Thread Walaa Eldin Moustafa
Thanks Steven. I feel it is needed so the MV spec is not scattered across the table and view spec pages. We may add a reference in each respective properties section. On Tue, May 7, 2024 at 10:04 AM Steven Wu wrote: > Walaa, thanks for initiating the next step. > > With the agreed model of separ

Re: Materialized Views: Next Steps

2024-05-07 Thread Steven Wu
Walaa, thanks for initiating the next step. With the agreed model of separate view and storage table, I am wondering if a separate materialized view spec page is needed. E.g., the new view metadata (view-materialized and view-storage-table) is probably good to be added to the view page directly to

Materialized Views: Next Steps

2024-05-06 Thread Walaa Eldin Moustafa
Hi Everyone, Thanks again for participating in the modeling discussion [1]. Since the outcome of this discussion was to model materialized views as separate objects, an Iceberg view and a table, I think the next step should be discussing the metadata details for each object. I have created a PR ht