> > For example, if we want to validate that the tables referenced in the view > exist, how can we do that when default-catalog isn't defined, since the > view hasn't been created or loaded yet?
I don't think this is related to view spec. How do we validate that a table exists without a default catalog, or do we always use the current session catalog? Thanks, Manu On Fri, Apr 25, 2025 at 5:59 AM Walaa Eldin Moustafa <wa.moust...@gmail.com> wrote: > Hi Jan, > > I think we still share the same understanding. Just to clarify: when I > referred to late binding as “similar” to the proposal, I was acknowledging > the distinction between view-level and table-level resolution. But as you > noted, both follow a late binding model. > > That said, this still raises an interesting question and a potential gap: > if default-catalog is only defined at query time, how should resolution > work during view creation? For example, if we want to validate that the > tables referenced in the view exist, how can we do that when > default-catalog isn't defined, since the view hasn't been created or loaded > yet? > > Thanks, > Walaa. > > On Thu, Apr 24, 2025 at 7:02 AM Jan Kaul <jank...@mailbox.org.invalid> > wrote: > >> Yes, I have the same understanding. The view catalog is resolved at query >> time. >> >> As you mentioned before, it's good to distinguish between the physical >> catalog and it's reference used in SQL statements. The important part is >> that the physical catalog of the view and the tables referenced in it's >> definition stay consistent. You could create a view in a given physical >> catalog by referring to it as "catalogA", as in your first point. If you >> then, given a different setup, refer to the same physical catalog as >> "catalogB" in another session/environment, the behavior should still work. >> >> I would however rephrase your last point. Late binding applies to the >> view catalog name and by extension to all partial table references when no >> "default-catalog" is present. Resolving the view catalog name at query time >> is not opposed to storing the view metadata in a catalog. >> >> Or maybe I don't entirely understand what you mean. >> >> Thanks >> >> Jan >> On 4/24/25 00:32, Walaa Eldin Moustafa wrote: >> >> Hi Jan, >> >> > The view is executed when it's being referenced in a SQL statement. >> That statement contains the information for the query engine to resolve the >> catalog of the view. >> >> If I’m understanding correctly, that means: >> >> * If the view is queried as SELECT * FROM catalogA.namespace.view, then >> catalogA is considered the view’s catalog. >> >> * If the same view is later queried as SELECT * FROM >> catalogB.namespace.view (after renaming catalogA to catalogB, and keeping >> everything else the same), then catalogB becomes the view’s catalog. >> >> Is that interpretation correct? If so, it sounds to me like the catalog >> is resolved at query time, based on how the view is referenced, not from >> any stored metadata. That would imply some sort of a late binding behavior >> (similar to the proposal), as opposed to using some catalog that "stores" >> the view definition. >> >> Thanks, >> Walaa >> >> On Tue, Apr 22, 2025 at 11:01 AM Jan Kaul <jank...@mailbox.org.invalid> >> <jank...@mailbox.org.invalid> wrote: >> >>> Hi Walaa, >>> >>> Thanks for clarifying the aspects of non-determinism. Let me try to >>> address your questions. >>> >>> 1. This is my interpretation of the current spec: The view is executed >>> when it's being referenced in a SQL statement. That statement contains the >>> information for the query engine to resolve the catalog of the view. The >>> query engine then uses that information to fetch the view metadata from the >>> catalog. It also needs to temporarily keep track of which catalog it used >>> to fetch the view metadata. It can then use that information to resolve the >>> table references in the views SQL definition in case no default catalog is >>> specified. >>> >>> 2. The important part is that the catalog can be referenced at execution >>> time. As long as that's the case I would assume the view can be created in >>> any catalog. >>> >>> >>> I think your point is really valuable because the current specification >>> can lead to some unintuitive behavior. For example for the following >>> statement: >>> >>> CREATE VIEW catalogA.sales.monthly_orders AS SELECT * from sales.orders; >>> >>> If the session default catalog is not "catalogA", the "sales.orders" in >>> the view query would not be the same as just referencing "sales.orders" in >>> a normal SQL statement. This is because without a "default-catalog", the >>> catalog name of "sales.orders" would default to "catalogA". >>> >>> >>> However, I like the current design of the view spec, because it has the >>> "closure" property. Because of the fact that the "view catalog" has to be >>> known when executing a view, all the information required to resolve the >>> table identifiers is contained in the view metadata (and the "view >>> catalog"). I think that if you make the identifier resolution dependent on >>> external parameters, it hinders portability. >>> >>> Thanks, >>> >>> Jan >>> On 4/22/25 18:36, Walaa Eldin Moustafa wrote: >>> >>> Hi Jan, >>> >>> Thanks for the thoughtful feedback. >>> >>> I think it’s important we clarify a key point before going deeper: >>> >>> Non-determinism is not caused by session fallback behavior—it’s a >>> *fundamental >>> limitation of using table identifiers* alone, regardless of whether we >>> use the current rule, the proposed fallback to the session’s default >>> catalog, or even early vs. late binding. >>> >>> The same fully qualified identifier (e.g., catalogA.namespace.table) can >>> resolve to different objects depending solely on engine-specific routing >>> logic or catalog aliases. So determinism isn’t guaranteed just because an >>> identifier is "fully qualified." The only reliable anchor for identity is >>> the UUID. That’s why the proposed use of UUIDs is not just a hardening >>> strategy. It’s the actual fix for correctness. >>> >>> To move the conversation forward, could you help clarify two things in >>> the context of the current spec: >>> >>> * Where in the metadata is the “view catalog” stored, so that an engine >>> knows to fall back to it if default-catalog is null? >>> >>> * Are we even allowed to create views in the session's default catalog >>> (i.e., without specifying a catalog) in the current Iceberg spec? >>> >>> These questions are important because if we can’t unambiguously recover >>> the "view catalog" from metadata, then defaulting to it is problematic. And >>> if views can't be created in the default catalog, then the fallback rule >>> doesn’t generalize. >>> >>> Thanks, >>> Walaa. >>> >>> On Tue, Apr 22, 2025 at 3:14 AM Jan Kaul <jank...@mailbox.org.invalid> >>> <jank...@mailbox.org.invalid> wrote: >>> >>>> Hi Walaa, >>>> >>>> thank you for your proposal. If I understood correctly, you proposal is >>>> composed of three parts: >>>> >>>> - session default catalog as fallback for "default-catalog" >>>> >>>> - session default namespace as fallback for "default-namepace" >>>> >>>> - Late binding + UUID validation >>>> >>>> I have some comments regarding these points. >>>> >>>> >>>> 1. Session default catalog as fallback for "default-catalog" >>>> >>>> Introducing a behavior that depends on the current session setup is in >>>> my opinion the definition of "non-determinism". You could be running the >>>> same query-engine and catalog-setup on different days, with different >>>> default session catalogs (which is rather common), and would be getting >>>> different results. >>>> >>>> Whereas with the current behavior, the view always produces the same >>>> results. The current behavior has some rough edges in very niche use cases >>>> but I think is solid for most uses cases. >>>> 2. Session default namespace as fallback for "default-namespace" >>>> >>>> Similar to the above. >>>> 3. Late binding + UUID validation >>>> >>>> If I understand it correctly, the current implementation already uses >>>> late binding. >>>> >>>> Generally, having UUID validation makes the setup more robust. Which is >>>> great. However, having UUID validation still requires us to have a portable >>>> table identifier specification. Even if we have the UUIDs of the referenced >>>> tables from the view, there simply isn't an interface that let's us use >>>> those UUIDs. The catalog interface is defined in terms of table >>>> identifiers. >>>> >>>> So we always require a working catalog setup and suiting table >>>> identifiers to obtain the table metadata. We can use the UUIDs to verify if >>>> we loaded the correct table. But this can only be done after we used some >>>> identifier. Which means there is no way of using UUIDs without a >>>> functioning catalog/identifier setup. >>>> >>>> >>>> In conclusion, I prefer the current behavior for "default-catalog" >>>> because it is more deterministic in my opinion. And I think the current >>>> spec does a good job for multi-engine table identifier resolution. I see >>>> the UUID validation more of an additional hardening strategy. >>>> >>>> Thanks >>>> >>>> Jan >>>> On 4/21/25 17:38, Walaa Eldin Moustafa wrote: >>>> >>>> Thanks Renjie! >>>> >>>> The existing spec has some guidance on resolving catalogs on the fly >>>> already (to address the case of view text with table identifiers missing >>>> the catalog part). The guidance is to use the catalog where the view is >>>> stored. But I find this rule hard to interpret or use. The catalog itself >>>> is a logical construct—such as a federated catalog that delegates to >>>> multiple physical backends (e.g., HMS and REST). In such cases, the catalog >>>> (e.g., `my_catalog` in `my_catalog.namespace1.table1`) doesn’t physically >>>> store the tables; it only routes requests to underlying stores. Therefore, >>>> defaulting identifier resolution based on the catalog where the view is >>>> "stored" doesn’t align with how catalogs actually behave in practice. >>>> >>>> Thanks, >>>> Walaa. >>>> >>>> On Sun, Apr 20, 2025 at 11:17 PM Renjie Liu <liurenjie2...@gmail.com> >>>> wrote: >>>> >>>>> Hi, Walaa: >>>>> >>>>> Thanks for the proposal. >>>>> >>>>> I've reviewed the doc, but in general I have some concerns with >>>>> resolving catalog names on the fly with query engine defined catalog >>>>> names. >>>>> This introduces some flexibility at first glance, but also makes >>>>> misconfiguration difficult to explain. >>>>> >>>>> But I agree with one part that we should store resolved table uuid in >>>>> view metadata, as table/view renaming may introduce errors that's >>>>> difficult >>>>> to understand for user. >>>>> >>>>> On Sat, Apr 19, 2025 at 3:02 AM Walaa Eldin Moustafa < >>>>> wa.moust...@gmail.com> wrote: >>>>> >>>>>> Hi Everyone, >>>>>> >>>>>> Looking forward to keeping up the momentum and closing out the MV >>>>>> spec as well. I’m hoping we can proceed to a vote next week. >>>>>> >>>>>> Here is a summary in case that helps. The proposal outlines a >>>>>> strategy for handling table identifiers in Iceberg view metadata, with >>>>>> the >>>>>> goal of ensuring correctness, portability, and engine compatibility. It >>>>>> recommends resolving table identifiers at read time (late binding) rather >>>>>> than creation time, and introduces UUID-based validation to maintain >>>>>> identity guarantees across engines, or sessions. It also revises how >>>>>> default-catalog and default-namespace are handled (defaulting both to the >>>>>> session context if not explicitly set) to better align with engine >>>>>> behavior >>>>>> and improve cross-engine interoperability. >>>>>> >>>>>> Please let me know your thoughts. >>>>>> >>>>>> Thanks, >>>>>> Walaa. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Apr 16, 2025 at 2:03 PM Walaa Eldin Moustafa < >>>>>> wa.moust...@gmail.com> wrote: >>>>>> >>>>>>> Thanks Eduard and Sung! I have addressed the comments. >>>>>>> >>>>>>> One key point to keep in mind is that catalog names in the spec >>>>>>> refer to logical catalogs—i.e., the first part of a three-part >>>>>>> identifier. >>>>>>> These correspond to Spark's DataSourceV2 catalogs, Trino connectors, and >>>>>>> similar constructs. This is a level of abstraction above physical >>>>>>> catalogs, >>>>>>> which are not referenced or used in the view spec. The reason is that >>>>>>> table >>>>>>> identifiers in the view definition/text itself refer to logical >>>>>>> catalogs, >>>>>>> not physical ones (since they interface directly with the engine and >>>>>>> not a >>>>>>> specific metastore). >>>>>>> >>>>>>> Thanks, >>>>>>> Walaa. >>>>>>> >>>>>>> >>>>>>> On Wed, Apr 16, 2025 at 6:15 AM Sung Yun <sungwy...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Thank you Walaa for the proposal. I think view portability is a >>>>>>>> very important topic for us to continue discussing as it relies on many >>>>>>>> assumptions within the data ecosystem for it to function like you've >>>>>>>> highlighted well in the document. >>>>>>>> >>>>>>>> I've added a few comments around how this may impact the permission >>>>>>>> questions the engines will be asking, and whether that is the desired >>>>>>>> behavior. >>>>>>>> >>>>>>>> Sung >>>>>>>> >>>>>>>> On Wed, Apr 16, 2025 at 7:32 AM Eduard Tudenhöfner < >>>>>>>> etudenhoef...@apache.org> wrote: >>>>>>>> >>>>>>>>> Thanks Walaa for tackling this problem. I've added a few comments >>>>>>>>> to get a better understanding of how this will look like in the actual >>>>>>>>> implementation. >>>>>>>>> >>>>>>>>> Eduard >>>>>>>>> >>>>>>>>> On Tue, Apr 15, 2025 at 7:09 PM Walaa Eldin Moustafa < >>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Everyone, >>>>>>>>>> >>>>>>>>>> Starting this thread to resume our discussion on how to reference >>>>>>>>>> table identifiers from Iceberg metadata, a key aspect of the view >>>>>>>>>> specification, particularly in relation to the MV (materialized view) >>>>>>>>>> extensions. >>>>>>>>>> >>>>>>>>>> I had the chance to speak offline with a few community members to >>>>>>>>>> better understand how the current spec is being interpreted. Those >>>>>>>>>> conversations served as inputs to a new proposal on how table >>>>>>>>>> identifier >>>>>>>>>> references could be represented in metadata. >>>>>>>>>> >>>>>>>>>> You can find the proposal here [1]. I look forward to your >>>>>>>>>> feedback and working together to move this forward so we can >>>>>>>>>> finalize the >>>>>>>>>> MV spec as well. >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> https://docs.google.com/document/d/1-I2v_OqBgJi_8HVaeH1u2jowghmXoB8XaJLzPBa_Hg8/edit?tab=t.0 >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Walaa. >>>>>>>>>> >>>>>>>>>