In Spark, I believe that the USE commands sets the current catalog and namespace. This affects both where the view is created and how unqualified table identifiers are resolved. I also don't see an issue with saving the current catalog and namespace into the view metadata's default-catalog and default-namespace fields.
On Wed, Apr 30, 2025 at 5:12 PM Walaa Eldin Moustafa <wa.moust...@gmail.com> wrote: > > I think that's the lesser evil compared to Iceberg specifying how > engines should resolve identifiers > > I think this is also similar to the previous point. It is the other way > around. Right now the spec dictates how to resolve (through employing a > view-specific `default-catalog` field). The proposal is suggesting to get > out of this space and let engines handle it similar to how they handle all > identifiers. > > On Wed, Apr 30, 2025 at 5:07 PM Walaa Eldin Moustafa < > wa.moust...@gmail.com> wrote: > >> > I thought "default-catalog" could be set via the USE command. >> >> Benny, I think this is a misconception or miscommunication. The USE >> command has no impact on the `default-catalog` field. In fact, the >> proposal's direction is exactly to establish that USE command should >> influence how tables are resolved, same like everywhere else. Right now it >> is not the case under the current spec. >> >> >> On Wed, Apr 30, 2025 at 3:17 PM Benny Chow <btc...@gmail.com> wrote: >> >>> > there is no SQL construct today to explicitly set default-catalog >>> >>> I thought "default-catalog" could be set via the USE command. >>> >>> I generally agree with Dan about requiring consistent catalog names. I >>> think that's the lesser evil compared to Iceberg specifying how engines >>> should resolve identifiers. Another thing to consider is that identifier >>> resolution can be very expensive at query validation time if identifiers >>> need to be looked up from a bunch of places. Hopefully, it should be >>> possible to define a view in such a way that identifiers can be resolved on >>> the first try. >>> >>> Benny >>> >>> On Tue, Apr 29, 2025 at 10:29 PM Walaa Eldin Moustafa < >>> wa.moust...@gmail.com> wrote: >>> >>>> Hi Rishabh, >>>> >>>> You're right that the proposal touches on two aspects, and resolution >>>> rules are one of them. The other aspect is the proposal's position that >>>> table identifiers should be stored in metadata exactly as they appear in >>>> the view text (e.g., even if they're two-part or partially qualified), >>>> along with their corresponding UUIDs for validation. This applies both to >>>> referenced input tables and the storage table identifier in materialized >>>> views. >>>> >>>> We may be able to converge on this storage format even if we haven't >>>> yet converged on the resolution fallback rules. I believe both resolution >>>> strategies currently being discussed would still lead to storing >>>> identifiers in this way. >>>> >>>> I'm supportive of moving forward with consensus on the identifier >>>> storage format. That said, we may continue to run into questions related to >>>> resolution during implementation. For example: Should the storage table >>>> identifier follow the same default-catalog and default-namespace resolution >>>> behavior as other table references? >>>> >>>> Thanks, >>>> Walaa. >>>> >>>> On Tue, Apr 29, 2025 at 10:07 PM Rishabh Bhatia < >>>> bhatiarishab...@gmail.com> wrote: >>>> >>>>> Hello Walaa, >>>>> >>>>> Thanks for starting this discussion. >>>>> >>>>> I think we should decouple at least the MV Spec from the proposal to >>>>> change the current behavior of view resolution. >>>>> >>>>> We can continue having the discussion if the current view spec needs >>>>> to be changed or not. Based on the decision at a later point if required >>>>> we >>>>> can update the view resolution rule. >>>>> >>>>> >>>>> Thanks, >>>>> Rishabh >>>>> >>>>> On Mon, Apr 28, 2025 at 3:22 PM Walaa Eldin Moustafa < >>>>> wa.moust...@gmail.com> wrote: >>>>> >>>>>> Correction of typo: both engines seem to set default-catalog to the >>>>>> view catalog if it is defined, or to null if the view catalog is not >>>>>> defined. >>>>>> >>>>>> On Mon, Apr 28, 2025 at 3:06 PM Walaa Eldin Moustafa < >>>>>> wa.moust...@gmail.com> wrote: >>>>>> >>>>>>> Hi Dan, >>>>>>> >>>>>>> Thanks again for your response. >>>>>>> >>>>>>> I agree that catalog renaming is an environmental event, but it's a >>>>>>> real one that happens frequently in practice. >>>>>>> Saying that the Iceberg spec cannot accommodate something as common >>>>>>> as catalog renaming feels very restrictive, and could make the spec less >>>>>>> practical, even unusable, for real-world deployments. >>>>>>> I’m sharing this from the perspective of a large data lake >>>>>>> environment where views are heavily deployed and operationalized. >>>>>>> >>>>>>> Further, it's worth noting that the table spec is resilient to >>>>>>> catalog renaming, but the view spec is not. If we have an opportunity to >>>>>>> make the view spec similarly resilient, I wonder why not? >>>>>>> Both specifications are deterministic in their definition, but one >>>>>>> is more fragile to environmental changes than the other. Improving >>>>>>> resilience does not sacrifice determinism. It simply makes views safer >>>>>>> and >>>>>>> more portable over time. >>>>>>> >>>>>>> Separately, given that there is no SQL construct today to explicitly >>>>>>> set default-catalog at creation time, what is the intuition behind how >>>>>>> engines like Spark and Trino currently assign default-catalog? >>>>>>> Today, both engines seem to set default-catalog to null if the view >>>>>>> catalog is defined, or to the view catalog if not. >>>>>>> What was the intended thought process behind this behavior? >>>>>>> >>>>>>> Thanks, >>>>>>> Walaa >>>>>>> >>>>>>> On Mon, Apr 28, 2025 at 1:33 PM Daniel Weeks <dwe...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> Walaa, >>>>>>>> >>>>>>>> > tables inside views remain reachable after a catalog rename >>>>>>>> >>>>>>>> This problem stems from the exact environmental/configuration issue >>>>>>>> that we should not be trying to address. I don't think we would expect >>>>>>>> references to survive a catalog rename. That's not something covered >>>>>>>> by >>>>>>>> the spec and needs to be handled separately as a platform-level >>>>>>>> migration >>>>>>>> specific to the affected environment. >>>>>>>> >>>>>>>> The identifier resolution logic is clear and deterministic. It >>>>>>>> should not matter whether an engine resolves and encodes the >>>>>>>> default-catalog or leaves it to the resolution rules. >>>>>>>> >>>>>>>> The issue isn't with how the spec is defined, but rather view >>>>>>>> behavior when you start altering the environment around it, which isn't >>>>>>>> something we should be trying to define here. >>>>>>>> >>>>>>>> -Dan >>>>>>>> >>>>>>>> On Mon, Apr 28, 2025 at 12:17 PM Walaa Eldin Moustafa < >>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Dan, >>>>>>>>> >>>>>>>>> Thanks for chiming in. >>>>>>>>> >>>>>>>>> I believe the issues we’re seeing now go beyond just catalog >>>>>>>>> naming consistency. The behavior around default-catalog itself >>>>>>>>> introduces >>>>>>>>> resolution inconsistencies even when catalog names are consistent. >>>>>>>>> For example: >>>>>>>>> >>>>>>>>> * When default-catalog is set to null, tables inside views remain >>>>>>>>> reachable after a catalog rename. But if it is set to a non-null >>>>>>>>> value, >>>>>>>>> table references will break. >>>>>>>>> >>>>>>>>> * default-catalog causes table references inside views to be early >>>>>>>>> bound (i.e., bound at view creation time, especially when using a >>>>>>>>> non-null >>>>>>>>> value), while table references inside standalone queries are late >>>>>>>>> bound >>>>>>>>> (bound at query time). This creates inconsistencies when resolving >>>>>>>>> the same >>>>>>>>> table name inside and outside views, even within the same job. >>>>>>>>> >>>>>>>>> * It causes Spark's and Trino behavior to drift from the spec. >>>>>>>>> There is no way to fully align Spark's behavior without making >>>>>>>>> invasive >>>>>>>>> changes to the Spark SQL grammar and the View DataSource API >>>>>>>>> (specifically >>>>>>>>> on the CREATE side). This challenge would extend to other engines >>>>>>>>> too. Both >>>>>>>>> Spark and Trino set this field based on a heuristic in today's >>>>>>>>> implementation. >>>>>>>>> >>>>>>>>> * With view nesting (views depending on views), these >>>>>>>>> inconsistencies amplify further, forcing users and engines to reason >>>>>>>>> about >>>>>>>>> catalog resolution at every level in the view tree. >>>>>>>>> >>>>>>>>> * It will be difficult to migrate Hive views to Iceberg with that >>>>>>>>> model. Migrated Hive views will have to unfollow that spec. >>>>>>>>> >>>>>>>>> How would you suggest approaching the engine-level changes >>>>>>>>> required to support the current default-catalog field? >>>>>>>>> Also, do you believe the Spark and Trino communities would align >>>>>>>>> around having table resolution behave inconsistently between queries >>>>>>>>> and >>>>>>>>> views, or inconsistency between Iceberg and other types of views? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Walaa >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Apr 28, 2025 at 11:34 AM Daniel Weeks <dwe...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I would agree with Jan's summary of why 'default-catalog' was >>>>>>>>>> introduced, but I think we need to step back and align on what we are >>>>>>>>>> really attempting to support in the spec. >>>>>>>>>> >>>>>>>>>> The issues we're discussing largely stem from using multiple >>>>>>>>>> engines with cross catalog references and configurations where >>>>>>>>>> catalog >>>>>>>>>> names are not aligned. If we have multiple engines that all have >>>>>>>>>> the same >>>>>>>>>> catalog names/configurations, the current spec implementation is well >>>>>>>>>> defined for table resolution even across catalogs. The >>>>>>>>>> 'default-catalog' >>>>>>>>>> (and namespace equivalent) was intended to address the resolution >>>>>>>>>> within >>>>>>>>>> the context of the sql text, not to address catalog/naming >>>>>>>>>> inconsistencies. >>>>>>>>>> >>>>>>>>>> I feel like we're trying to adapt the original intent to address >>>>>>>>>> the catalog naming/configuration and would argue that we shouldn't >>>>>>>>>> attempt >>>>>>>>>> to do that as part of the spec. Inconsistently named catalogs are a >>>>>>>>>> reality, but we should consider that a configuration/environmental >>>>>>>>>> issue, >>>>>>>>>> not something to solve for in the spec. >>>>>>>>>> >>>>>>>>>> We should support and advocate for consistency in catalog naming >>>>>>>>>> and define the spec along those lines. The fact is that with all of >>>>>>>>>> the >>>>>>>>>> recent work that's gone into making catalogs pluggable, it makes >>>>>>>>>> more sense >>>>>>>>>> to just register catalog configuration with consistent names (even >>>>>>>>>> if you >>>>>>>>>> have to duplicate the configuration for supporting existing >>>>>>>>>> readers/writers). I think it's better to provide a path toward >>>>>>>>>> consistency >>>>>>>>>> than to normalize complicated schemes to workaround the issues >>>>>>>>>> caused by >>>>>>>>>> environmental/configuration inconsistencies. >>>>>>>>>> >>>>>>>>>> If the goal is to create clever ways to hack the late binding >>>>>>>>>> resolution to swap in different catalogs or make references >>>>>>>>>> contextual, I >>>>>>>>>> feel like that is something we should strongly discourage as it >>>>>>>>>> leads to >>>>>>>>>> confusion about what is resolved as part of the query. >>>>>>>>>> >>>>>>>>>> At this point, I don't see a good argument to add >>>>>>>>>> additional configuration or change the resolution behaviors. >>>>>>>>>> >>>>>>>>>> -Dan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Apr 28, 2025 at 12:40 AM Jan Kaul >>>>>>>>>> <jank...@mailbox.org.invalid> wrote: >>>>>>>>>> >>>>>>>>>>> I think the intention with the "default-catalog" was that every >>>>>>>>>>> query engine uses it to store its session default catalog at the >>>>>>>>>>> time of >>>>>>>>>>> creating the view. This way the view could be reused in another >>>>>>>>>>> session. >>>>>>>>>>> The idea was not to introduce an additional SQL syntax to set the >>>>>>>>>>> default-catalog. >>>>>>>>>>> >>>>>>>>>>> Generally we have different environments we want to support with >>>>>>>>>>> the view spec: >>>>>>>>>>> >>>>>>>>>>> 1. Consistent catalog naming >>>>>>>>>>> >>>>>>>>>>> When the environment supports it, using consistent catalog names >>>>>>>>>>> can have a great benefit for multi-catalog, multi-engine setups. >>>>>>>>>>> With >>>>>>>>>>> consistent catalog names, using the "default-catalog" field works >>>>>>>>>>> without >>>>>>>>>>> any issues. >>>>>>>>>>> >>>>>>>>>>> 2. Inconsistent catalog naming >>>>>>>>>>> >>>>>>>>>>> This can be the case when different query engines refer to the >>>>>>>>>>> same physical catalog by different names. This often happens because >>>>>>>>>>> different query engines use different strategies to setup the >>>>>>>>>>> catalogs. If >>>>>>>>>>> catalogs have inconsistent naming, using the "default-catalog" >>>>>>>>>>> field does >>>>>>>>>>> not work because it is not guaranteed that the catalog name can be >>>>>>>>>>> resolved >>>>>>>>>>> with another engine. Using the "view catalog" as a fallback is a >>>>>>>>>>> better >>>>>>>>>>> solution for this use case, as it avoids catalog names altogether. >>>>>>>>>>> It is >>>>>>>>>>> however limited to table references in the same catalog. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> What do you think of introducing a view property that specifies >>>>>>>>>>> if the "default-catalog" or the "view catalog" should be used? This >>>>>>>>>>> way, >>>>>>>>>>> you could use the "default-catalog" in environments where you can >>>>>>>>>>> guarantee >>>>>>>>>>> consistent naming, but you would be able to directly fallback to the >>>>>>>>>>> "view-catalog" when you don't have consistent naming. The query >>>>>>>>>>> engines >>>>>>>>>>> could set the default for this view property at creation time. >>>>>>>>>>> Spark for >>>>>>>>>>> example could set it to automatically use the "view catalog". >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> Jan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 4/26/25 05:33, Walaa Eldin Moustafa wrote: >>>>>>>>>>> >>>>>>>>>>> To help folks catch up on the latest discussions and >>>>>>>>>>> interpretation of the spec, I have summarized everything we >>>>>>>>>>> discussed so >>>>>>>>>>> far at the top of the proposal document (here >>>>>>>>>>> <https://docs.google.com/document/d/1-I2v_OqBgJi_8HVaeH1u2jowghmXoB8XaJLzPBa_Hg8/edit?tab=t.0>). >>>>>>>>>>> I have slightly updated the proposal to be in sync with the new >>>>>>>>>>> interpretation to avoid confusion. In summary: >>>>>>>>>>> >>>>>>>>>>> * Remove default-catalog and default-namespace fields from the >>>>>>>>>>> view spec completely. >>>>>>>>>>> >>>>>>>>>>> * Hence, we do not attempt to define separate view-level default >>>>>>>>>>> catalogs or namespaces. >>>>>>>>>>> >>>>>>>>>>> Instead: >>>>>>>>>>> >>>>>>>>>>> * If a table identifier inside a view lacks a catalog qualifier, >>>>>>>>>>> engines should resolve it using the current engine catalog at query >>>>>>>>>>> time. >>>>>>>>>>> >>>>>>>>>>> * Reference table identifiers in the metadata exactly as they >>>>>>>>>>> appear in the view SQL text. >>>>>>>>>>> >>>>>>>>>>> * If an identifier lacks the catalog part at creation, it should >>>>>>>>>>> still lack a catalog in the stored metadata. >>>>>>>>>>> >>>>>>>>>>> * Store UUIDs alongside table identifiers whenever possible. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Walaa. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 25, 2025 at 5:18 PM Walaa Eldin Moustafa < >>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks for the contribution Benny! +1 to the confusion the >>>>>>>>>>>> fallback creates. Also just to be clear, at this point and after >>>>>>>>>>>> clarifying >>>>>>>>>>>> the current spec intentions, I am convinced that we should remove >>>>>>>>>>>> the >>>>>>>>>>>> default catalog and default namespace fields altogether. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Walaa. >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Apr 25, 2025 at 5:13 PM Benny Chow <btc...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I'd like to contribute my opinions on this: >>>>>>>>>>>>> >>>>>>>>>>>>> - I don't particularly like the current behavior of "default >>>>>>>>>>>>> to the view's catalog when default-catalog is not set". >>>>>>>>>>>>> Fundamentally, I >>>>>>>>>>>>> believe the intent of default-catalog and default-namespace is >>>>>>>>>>>>> there to >>>>>>>>>>>>> help users write more concise SQL. >>>>>>>>>>>>> - spark session catalog is engine specific and I don't think >>>>>>>>>>>>> we should design something that says first use this catalog, then >>>>>>>>>>>>> that >>>>>>>>>>>>> catalog.. or that catalog. For example, resolving identifiers >>>>>>>>>>>>> using >>>>>>>>>>>>> default-catalog -> view's catalog -> session catalog is not good. >>>>>>>>>>>>> - We gotta support non-Iceberg tables otherwise I see no value >>>>>>>>>>>>> in putting views in the catalog to share with other engines >>>>>>>>>>>>> - Interoperability between different engine types is very hard >>>>>>>>>>>>> due to dialect issues... so I think we should focus on supporting >>>>>>>>>>>>> different >>>>>>>>>>>>> clusters of the same engine type on a shared catalog. For >>>>>>>>>>>>> example, AI and >>>>>>>>>>>>> BI clusters on Spark sharing the same views in a REST catalog. >>>>>>>>>>>>> >>>>>>>>>>>>> Coincidentally, I think the ultimate solution is along the >>>>>>>>>>>>> lines of something Russell proposed last year: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> https://lists.apache.org/thread/hoskfx8y3kvrcww52l4w9dxghp3pnlm7 >>>>>>>>>>>>> >>>>>>>>>>>>> We've been looking at this interoperable identifier problem >>>>>>>>>>>>> through the lens of catalog resolution but maybe the right >>>>>>>>>>>>> approach is >>>>>>>>>>>>> really about templating. >>>>>>>>>>>>> >>>>>>>>>>>>> I would extend Russell's idea to allow identifiers in a view >>>>>>>>>>>>> to span catalogs to support non-Iceberg tables. Also, the >>>>>>>>>>>>> default-catalog >>>>>>>>>>>>> property could be templated as well. >>>>>>>>>>>>> >>>>>>>>>>>>> Thoughts? >>>>>>>>>>>>> Benny >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Apr 25, 2025 at 4:02 PM Walaa Eldin Moustafa < >>>>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks Steven! How do you recommend making Spark >>>>>>>>>>>>>> implementation conform to the spec? Do we need Spark SQL >>>>>>>>>>>>>> extensions and/or >>>>>>>>>>>>>> Spark catalog APIs for that? >>>>>>>>>>>>>> >>>>>>>>>>>>>> How do you recommend reconciling the inconsistencies I shared >>>>>>>>>>>>>> regarding many resolution methods not consistently being >>>>>>>>>>>>>> followed in >>>>>>>>>>>>>> different scenarios (view vs child table resolution, query vs >>>>>>>>>>>>>> view >>>>>>>>>>>>>> resolution)? Note these occur when the default catalog is set to >>>>>>>>>>>>>> a non-null >>>>>>>>>>>>>> value. If it helps, I can share concrete examples. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Walaa. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Apr 25, 2025 at 3:52 PM Steven Wu < >>>>>>>>>>>>>> stevenz...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The core issue is on the fall back behavior when >>>>>>>>>>>>>>> `default-catalog` is >>>>>>>>>>>>>>> not defined. Current view spec says the fallback should be >>>>>>>>>>>>>>> the catalog >>>>>>>>>>>>>>> where the view is defined. It doesn't really matter what the >>>>>>>>>>>>>>> catalog >>>>>>>>>>>>>>> is named (catalogX) by the read engine. >>>>>>>>>>>>>>> - If a view refers to the tables in the same catalog, this >>>>>>>>>>>>>>> is a >>>>>>>>>>>>>>> non-ambiguous and reasonable fallback behavior. >>>>>>>>>>>>>>> - If a view refers to tables from another catalog, catalog >>>>>>>>>>>>>>> names >>>>>>>>>>>>>>> should be included in the reference name already. So no >>>>>>>>>>>>>>> ambiguity >>>>>>>>>>>>>>> there either. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Potential inconsistent naming of catalog is a separate >>>>>>>>>>>>>>> problem, which >>>>>>>>>>>>>>> Iceberg view spec probably cannot solve. We can only >>>>>>>>>>>>>>> recommend that >>>>>>>>>>>>>>> catalog should be named consistently across usage for better >>>>>>>>>>>>>>> interoperability on name references. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This proposal is to change the fallback behavior to engine's >>>>>>>>>>>>>>> session >>>>>>>>>>>>>>> default catalog. I am not sure it is better than the current >>>>>>>>>>>>>>> fallback >>>>>>>>>>>>>>> behavior. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> > Today’s Spark behavior explicitly differs from this idea. >>>>>>>>>>>>>>> Spark resolves table identifiers during view creation using the >>>>>>>>>>>>>>> session’s >>>>>>>>>>>>>>> default catalog, not a supplied `default-catalog`. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I would argue that is a Spark implementation issue for not >>>>>>>>>>>>>>> conforming >>>>>>>>>>>>>>> to the spec. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Apr 25, 2025 at 1:17 PM Walaa Eldin Moustafa >>>>>>>>>>>>>>> <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > Hi Jan, >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > Thanks again for continuing the discussion. I want to >>>>>>>>>>>>>>> highlight a few fundamental issues around the interpretation of >>>>>>>>>>>>>>> default-catalog: >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > Here is the real catch: >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > * default-catalog cannot logically be defined at view >>>>>>>>>>>>>>> creation time. It would be circular: the view needs to exist >>>>>>>>>>>>>>> before its >>>>>>>>>>>>>>> metadata (and hence default-catalog) can exist. This is visible >>>>>>>>>>>>>>> in Spark’s >>>>>>>>>>>>>>> implementation, where `default-catalog` is not used. >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > * Introducing a creation-time default-catalog setting >>>>>>>>>>>>>>> would require extending SQL syntax and engine APIs to promote >>>>>>>>>>>>>>> it to a >>>>>>>>>>>>>>> first-class view concept. This would be intrusive, >>>>>>>>>>>>>>> non-intuitive, and >>>>>>>>>>>>>>> realistically very difficult to standardize across engines. >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > * Today’s Spark behavior explicitly differs from this >>>>>>>>>>>>>>> idea. Spark resolves table identifiers during view creation >>>>>>>>>>>>>>> using the >>>>>>>>>>>>>>> session’s default catalog, not a supplied `default-catalog`. >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > * Hypothetically even if we patched in a creation-time >>>>>>>>>>>>>>> default-catalog, it would create an inconsistent binding model >>>>>>>>>>>>>>> between >>>>>>>>>>>>>>> tables vs views (early vs late), and between tables in views >>>>>>>>>>>>>>> and in queries >>>>>>>>>>>>>>> (again early vs late). For example, views and tables in queries >>>>>>>>>>>>>>> can >>>>>>>>>>>>>>> withstand default catalog renames, but tables cannot when they >>>>>>>>>>>>>>> are used >>>>>>>>>>>>>>> inside views -- it even applies to views inside views, which >>>>>>>>>>>>>>> makes this >>>>>>>>>>>>>>> very hard to reason about considering nesting. >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > Thanks, >>>>>>>>>>>>>>> > Walaa >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > On Fri, Apr 25, 2025 at 7:00 AM Jan Kaul >>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> <jank...@mailbox.org.invalid> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> @Walaa: >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> I would argue that when you run a CREATE VIEW statement >>>>>>>>>>>>>>> the query engine knowns which catalog the view is being created >>>>>>>>>>>>>>> in. So even >>>>>>>>>>>>>>> though we typically use late binding to resolve the view >>>>>>>>>>>>>>> catalog at query >>>>>>>>>>>>>>> time, it can also be used at creation time. >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> The query engine would need to keep track of the "view >>>>>>>>>>>>>>> catalog" where the view is going to be created in. It can use >>>>>>>>>>>>>>> that catalog >>>>>>>>>>>>>>> to resolve partial table identifiers if "default-catalog" is >>>>>>>>>>>>>>> not set. >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> It can lead to some unintuitive behavior, where partial >>>>>>>>>>>>>>> identifiers in the view query resolve to a different catalog >>>>>>>>>>>>>>> compared to >>>>>>>>>>>>>>> using them outside of a view. >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> CREATE VIEW catalogA.sales.monthly_orders AS SELECT * >>>>>>>>>>>>>>> from sales.orders; >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> If the session default catalog is not "catalogA", the >>>>>>>>>>>>>>> "sales.orders" in the view query would not be the same as just >>>>>>>>>>>>>>> referencing >>>>>>>>>>>>>>> "sales.orders" in a normal SQL statement. This is because >>>>>>>>>>>>>>> without a >>>>>>>>>>>>>>> "default-catalog", the catalog name of "sales.orders" would >>>>>>>>>>>>>>> default to >>>>>>>>>>>>>>> "catalogA", which is the view's catalog. >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> Thanks, >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> Jan >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> On 4/25/25 04:05, Manu Zhang wrote: >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> For example, if we want to validate that the tables >>>>>>>>>>>>>>> referenced in the view exist, how can we do that when >>>>>>>>>>>>>>> default-catalog isn't >>>>>>>>>>>>>>> defined, since the view hasn't been created or loaded yet? >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> I don't think this is related to view spec. How do we >>>>>>>>>>>>>>> validate that a table exists without a default catalog, or do >>>>>>>>>>>>>>> we always use >>>>>>>>>>>>>>> the current session catalog? >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> Thanks, >>>>>>>>>>>>>>> >> Manu >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> On Fri, Apr 25, 2025 at 5:59 AM Walaa Eldin Moustafa < >>>>>>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> Hi Jan, >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> I think we still share the same understanding. Just to >>>>>>>>>>>>>>> clarify: when I referred to late binding as “similar” to the >>>>>>>>>>>>>>> proposal, I >>>>>>>>>>>>>>> was acknowledging the distinction between view-level and >>>>>>>>>>>>>>> table-level >>>>>>>>>>>>>>> resolution. But as you noted, both follow a late binding model. >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> That said, this still raises an interesting question and >>>>>>>>>>>>>>> a potential gap: if default-catalog is only defined at query >>>>>>>>>>>>>>> time, how >>>>>>>>>>>>>>> should resolution work during view creation? For example, if we >>>>>>>>>>>>>>> want to >>>>>>>>>>>>>>> validate that the tables referenced in the view exist, how can >>>>>>>>>>>>>>> we do that >>>>>>>>>>>>>>> when default-catalog isn't defined, since the view hasn't been >>>>>>>>>>>>>>> created or >>>>>>>>>>>>>>> loaded yet? >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> Thanks, >>>>>>>>>>>>>>> >>> Walaa. >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> On Thu, Apr 24, 2025 at 7:02 AM Jan Kaul >>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> <jank...@mailbox.org.invalid> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> Yes, I have the same understanding. The view catalog is >>>>>>>>>>>>>>> resolved at query time. >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> As you mentioned before, it's good to distinguish >>>>>>>>>>>>>>> between the physical catalog and it's reference used in SQL >>>>>>>>>>>>>>> statements. The >>>>>>>>>>>>>>> important part is that the physical catalog of the view and the >>>>>>>>>>>>>>> tables >>>>>>>>>>>>>>> referenced in it's definition stay consistent. You could create >>>>>>>>>>>>>>> a view in a >>>>>>>>>>>>>>> given physical catalog by referring to it as "catalogA", as in >>>>>>>>>>>>>>> your first >>>>>>>>>>>>>>> point. If you then, given a different setup, refer to the same >>>>>>>>>>>>>>> physical >>>>>>>>>>>>>>> catalog as "catalogB" in another session/environment, the >>>>>>>>>>>>>>> behavior should >>>>>>>>>>>>>>> still work. >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> I would however rephrase your last point. Late binding >>>>>>>>>>>>>>> applies to the view catalog name and by extension to all >>>>>>>>>>>>>>> partial table >>>>>>>>>>>>>>> references when no "default-catalog" is present. Resolving the >>>>>>>>>>>>>>> view catalog >>>>>>>>>>>>>>> name at query time is not opposed to storing the view metadata >>>>>>>>>>>>>>> in a catalog. >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> Or maybe I don't entirely understand what you mean. >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> Thanks >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> Jan >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> On 4/24/25 00:32, Walaa Eldin Moustafa wrote: >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> Hi Jan, >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> > The view is executed when it's being referenced in a >>>>>>>>>>>>>>> SQL statement. That statement contains the information for the >>>>>>>>>>>>>>> query engine >>>>>>>>>>>>>>> to resolve the catalog of the view. >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> If I’m understanding correctly, that means: >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> * If the view is queried as SELECT * FROM >>>>>>>>>>>>>>> catalogA.namespace.view, then catalogA is considered the view’s >>>>>>>>>>>>>>> catalog. >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> * If the same view is later queried as SELECT * FROM >>>>>>>>>>>>>>> catalogB.namespace.view (after renaming catalogA to catalogB, >>>>>>>>>>>>>>> and keeping >>>>>>>>>>>>>>> everything else the same), then catalogB becomes the view’s >>>>>>>>>>>>>>> catalog. >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> Is that interpretation correct? If so, it sounds to me >>>>>>>>>>>>>>> like the catalog is resolved at query time, based on how the >>>>>>>>>>>>>>> view is >>>>>>>>>>>>>>> referenced, not from any stored metadata. That would imply some >>>>>>>>>>>>>>> sort of a >>>>>>>>>>>>>>> late binding behavior (similar to the proposal), as opposed to >>>>>>>>>>>>>>> using some >>>>>>>>>>>>>>> catalog that "stores" the view definition. >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> Thanks, >>>>>>>>>>>>>>> >>>> Walaa >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> On Tue, Apr 22, 2025 at 11:01 AM Jan Kaul >>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> <jank...@mailbox.org.invalid> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> Hi Walaa, >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> Thanks for clarifying the aspects of non-determinism. >>>>>>>>>>>>>>> Let me try to address your questions. >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> 1. This is my interpretation of the current spec: The >>>>>>>>>>>>>>> view is executed when it's being referenced in a SQL statement. >>>>>>>>>>>>>>> That >>>>>>>>>>>>>>> statement contains the information for the query engine to >>>>>>>>>>>>>>> resolve the >>>>>>>>>>>>>>> catalog of the view. The query engine then uses that >>>>>>>>>>>>>>> information to fetch >>>>>>>>>>>>>>> the view metadata from the catalog. It also needs to >>>>>>>>>>>>>>> temporarily keep track >>>>>>>>>>>>>>> of which catalog it used to fetch the view metadata. It can >>>>>>>>>>>>>>> then use that >>>>>>>>>>>>>>> information to resolve the table references in the views SQL >>>>>>>>>>>>>>> definition in >>>>>>>>>>>>>>> case no default catalog is specified. >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> 2. The important part is that the catalog can be >>>>>>>>>>>>>>> referenced at execution time. As long as that's the case I >>>>>>>>>>>>>>> would assume the >>>>>>>>>>>>>>> view can be created in any catalog. >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> I think your point is really valuable because the >>>>>>>>>>>>>>> current specification can lead to some unintuitive behavior. >>>>>>>>>>>>>>> For example >>>>>>>>>>>>>>> for the following statement: >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> CREATE VIEW catalogA.sales.monthly_orders AS SELECT * >>>>>>>>>>>>>>> from sales.orders; >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> If the session default catalog is not "catalogA", the >>>>>>>>>>>>>>> "sales.orders" in the view query would not be the same as just >>>>>>>>>>>>>>> referencing >>>>>>>>>>>>>>> "sales.orders" in a normal SQL statement. This is because >>>>>>>>>>>>>>> without a >>>>>>>>>>>>>>> "default-catalog", the catalog name of "sales.orders" would >>>>>>>>>>>>>>> default to >>>>>>>>>>>>>>> "catalogA". >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> However, I like the current design of the view spec, >>>>>>>>>>>>>>> because it has the "closure" property. Because of the fact that >>>>>>>>>>>>>>> the "view >>>>>>>>>>>>>>> catalog" has to be known when executing a view, all the >>>>>>>>>>>>>>> information >>>>>>>>>>>>>>> required to resolve the table identifiers is contained in the >>>>>>>>>>>>>>> view metadata >>>>>>>>>>>>>>> (and the "view catalog"). I think that if you make the >>>>>>>>>>>>>>> identifier >>>>>>>>>>>>>>> resolution dependent on external parameters, it hinders >>>>>>>>>>>>>>> portability. >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> Thanks, >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> Jan >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> On 4/22/25 18:36, Walaa Eldin Moustafa wrote: >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> Hi Jan, >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> Thanks for the thoughtful feedback. >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> I think it’s important we clarify a key point before >>>>>>>>>>>>>>> going deeper: >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> Non-determinism is not caused by session fallback >>>>>>>>>>>>>>> behavior—it’s a fundamental limitation of using table >>>>>>>>>>>>>>> identifiers alone, >>>>>>>>>>>>>>> regardless of whether we use the current rule, the proposed >>>>>>>>>>>>>>> fallback to the >>>>>>>>>>>>>>> session’s default catalog, or even early vs. late binding. >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> The same fully qualified identifier (e.g., >>>>>>>>>>>>>>> catalogA.namespace.table) can resolve to different objects >>>>>>>>>>>>>>> depending solely >>>>>>>>>>>>>>> on engine-specific routing logic or catalog aliases. So >>>>>>>>>>>>>>> determinism isn’t >>>>>>>>>>>>>>> guaranteed just because an identifier is "fully qualified." The >>>>>>>>>>>>>>> only >>>>>>>>>>>>>>> reliable anchor for identity is the UUID. That’s why the >>>>>>>>>>>>>>> proposed use of >>>>>>>>>>>>>>> UUIDs is not just a hardening strategy. It’s the actual fix for >>>>>>>>>>>>>>> correctness. >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> To move the conversation forward, could you help >>>>>>>>>>>>>>> clarify two things in the context of the current spec: >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> * Where in the metadata is the “view catalog” stored, >>>>>>>>>>>>>>> so that an engine knows to fall back to it if default-catalog >>>>>>>>>>>>>>> is null? >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> * Are we even allowed to create views in the session's >>>>>>>>>>>>>>> default catalog (i.e., without specifying a catalog) in the >>>>>>>>>>>>>>> current Iceberg >>>>>>>>>>>>>>> spec? >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> These questions are important because if we can’t >>>>>>>>>>>>>>> unambiguously recover the "view catalog" from metadata, then >>>>>>>>>>>>>>> defaulting to >>>>>>>>>>>>>>> it is problematic. And if views can't be created in the default >>>>>>>>>>>>>>> catalog, >>>>>>>>>>>>>>> then the fallback rule doesn’t generalize. >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> Thanks, >>>>>>>>>>>>>>> >>>>> Walaa. >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> On Tue, Apr 22, 2025 at 3:14 AM Jan Kaul >>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> <jank...@mailbox.org.invalid> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> Hi Walaa, >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> thank you for your proposal. If I understood >>>>>>>>>>>>>>> correctly, you proposal is composed of three parts: >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> - session default catalog as fallback for >>>>>>>>>>>>>>> "default-catalog" >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> - session default namespace as fallback for >>>>>>>>>>>>>>> "default-namepace" >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> - Late binding + UUID validation >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> I have some comments regarding these points. >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> 1. Session default catalog as fallback for >>>>>>>>>>>>>>> "default-catalog" >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> Introducing a behavior that depends on the current >>>>>>>>>>>>>>> session setup is in my opinion the definition of >>>>>>>>>>>>>>> "non-determinism". You >>>>>>>>>>>>>>> could be running the same query-engine and catalog-setup on >>>>>>>>>>>>>>> different days, >>>>>>>>>>>>>>> with different default session catalogs (which is rather >>>>>>>>>>>>>>> common), and would >>>>>>>>>>>>>>> be getting different results. >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> Whereas with the current behavior, the view always >>>>>>>>>>>>>>> produces the same results. The current behavior has some rough >>>>>>>>>>>>>>> edges in >>>>>>>>>>>>>>> very niche use cases but I think is solid for most uses cases. >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> 2. Session default namespace as fallback for >>>>>>>>>>>>>>> "default-namespace" >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> Similar to the above. >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> 3. Late binding + UUID validation >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> If I understand it correctly, the current >>>>>>>>>>>>>>> implementation already uses late binding. >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> Generally, having UUID validation makes the setup >>>>>>>>>>>>>>> more robust. Which is great. However, having UUID validation >>>>>>>>>>>>>>> still requires >>>>>>>>>>>>>>> us to have a portable table identifier specification. Even if >>>>>>>>>>>>>>> we have the >>>>>>>>>>>>>>> UUIDs of the referenced tables from the view, there simply >>>>>>>>>>>>>>> isn't an >>>>>>>>>>>>>>> interface that let's us use those UUIDs. The catalog interface >>>>>>>>>>>>>>> is defined >>>>>>>>>>>>>>> in terms of table identifiers. >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> So we always require a working catalog setup and >>>>>>>>>>>>>>> suiting table identifiers to obtain the table metadata. We can >>>>>>>>>>>>>>> use the >>>>>>>>>>>>>>> UUIDs to verify if we loaded the correct table. But this can >>>>>>>>>>>>>>> only be done >>>>>>>>>>>>>>> after we used some identifier. Which means there is no way of >>>>>>>>>>>>>>> using UUIDs >>>>>>>>>>>>>>> without a functioning catalog/identifier setup. >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> In conclusion, I prefer the current behavior for >>>>>>>>>>>>>>> "default-catalog" because it is more deterministic in my >>>>>>>>>>>>>>> opinion. And I >>>>>>>>>>>>>>> think the current spec does a good job for multi-engine table >>>>>>>>>>>>>>> identifier >>>>>>>>>>>>>>> resolution. I see the UUID validation more of an additional >>>>>>>>>>>>>>> hardening >>>>>>>>>>>>>>> strategy. >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> Thanks >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> Jan >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> On 4/21/25 17:38, Walaa Eldin Moustafa wrote: >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> Thanks Renjie! >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> The existing spec has some guidance on resolving >>>>>>>>>>>>>>> catalogs on the fly already (to address the case of view text >>>>>>>>>>>>>>> with table >>>>>>>>>>>>>>> identifiers missing the catalog part). The guidance is to use >>>>>>>>>>>>>>> the catalog >>>>>>>>>>>>>>> where the view is stored. But I find this rule hard to >>>>>>>>>>>>>>> interpret or use. >>>>>>>>>>>>>>> The catalog itself is a logical construct—such as a federated >>>>>>>>>>>>>>> catalog that >>>>>>>>>>>>>>> delegates to multiple physical backends (e.g., HMS and REST). >>>>>>>>>>>>>>> In such >>>>>>>>>>>>>>> cases, the catalog (e.g., `my_catalog` in >>>>>>>>>>>>>>> `my_catalog.namespace1.table1`) >>>>>>>>>>>>>>> doesn’t physically store the tables; it only routes requests to >>>>>>>>>>>>>>> underlying >>>>>>>>>>>>>>> stores. Therefore, defaulting identifier resolution based on >>>>>>>>>>>>>>> the catalog >>>>>>>>>>>>>>> where the view is "stored" doesn’t align with how catalogs >>>>>>>>>>>>>>> actually behave >>>>>>>>>>>>>>> in practice. >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>> Walaa. >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> On Sun, Apr 20, 2025 at 11:17 PM Renjie Liu < >>>>>>>>>>>>>>> liurenjie2...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>> >>>>>>> Hi, Walaa: >>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>> >>>>>>> Thanks for the proposal. >>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>> >>>>>>> I've reviewed the doc, but in general I have some >>>>>>>>>>>>>>> concerns with resolving catalog names on the fly with query >>>>>>>>>>>>>>> engine defined >>>>>>>>>>>>>>> catalog names. This introduces some flexibility at first >>>>>>>>>>>>>>> glance, but also >>>>>>>>>>>>>>> makes misconfiguration difficult to explain. >>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>> >>>>>>> But I agree with one part that we should store >>>>>>>>>>>>>>> resolved table uuid in view metadata, as table/view renaming >>>>>>>>>>>>>>> may introduce >>>>>>>>>>>>>>> errors that's difficult to understand for user. >>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>> >>>>>>> On Sat, Apr 19, 2025 at 3:02 AM Walaa Eldin Moustafa >>>>>>>>>>>>>>> <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>> >>>>>>>> Hi Everyone, >>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>> >>>>>>>> Looking forward to keeping up the momentum and >>>>>>>>>>>>>>> closing out the MV spec as well. I’m hoping we can proceed to a >>>>>>>>>>>>>>> vote next >>>>>>>>>>>>>>> week. >>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>> >>>>>>>> Here is a summary in case that helps. The proposal >>>>>>>>>>>>>>> outlines a strategy for handling table identifiers in Iceberg >>>>>>>>>>>>>>> view >>>>>>>>>>>>>>> metadata, with the goal of ensuring correctness, portability, >>>>>>>>>>>>>>> and engine >>>>>>>>>>>>>>> compatibility. It recommends resolving table identifiers at >>>>>>>>>>>>>>> read time (late >>>>>>>>>>>>>>> binding) rather than creation time, and introduces UUID-based >>>>>>>>>>>>>>> validation to >>>>>>>>>>>>>>> maintain identity guarantees across engines, or sessions. It >>>>>>>>>>>>>>> also revises >>>>>>>>>>>>>>> how default-catalog and default-namespace are handled >>>>>>>>>>>>>>> (defaulting both to >>>>>>>>>>>>>>> the session context if not explicitly set) to better align with >>>>>>>>>>>>>>> engine >>>>>>>>>>>>>>> behavior and improve cross-engine interoperability. >>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>> >>>>>>>> Please let me know your thoughts. >>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>> >>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>> Walaa. >>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>> >>>>>>>> On Wed, Apr 16, 2025 at 2:03 PM Walaa Eldin >>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> Thanks Eduard and Sung! I have addressed the >>>>>>>>>>>>>>> comments. >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> One key point to keep in mind is that catalog >>>>>>>>>>>>>>> names in the spec refer to logical catalogs—i.e., the first >>>>>>>>>>>>>>> part of a >>>>>>>>>>>>>>> three-part identifier. These correspond to Spark's DataSourceV2 >>>>>>>>>>>>>>> catalogs, >>>>>>>>>>>>>>> Trino connectors, and similar constructs. This is a level of >>>>>>>>>>>>>>> abstraction >>>>>>>>>>>>>>> above physical catalogs, which are not referenced or used in >>>>>>>>>>>>>>> the view spec. >>>>>>>>>>>>>>> The reason is that table identifiers in the view >>>>>>>>>>>>>>> definition/text itself >>>>>>>>>>>>>>> refer to logical catalogs, not physical ones (since they >>>>>>>>>>>>>>> interface directly >>>>>>>>>>>>>>> with the engine and not a specific metastore). >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>> Walaa. >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> On Wed, Apr 16, 2025 at 6:15 AM Sung Yun < >>>>>>>>>>>>>>> sungwy...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> Thank you Walaa for the proposal. I think view >>>>>>>>>>>>>>> portability is a very important topic for us to continue >>>>>>>>>>>>>>> discussing as it >>>>>>>>>>>>>>> relies on many assumptions within the data ecosystem for it to >>>>>>>>>>>>>>> function >>>>>>>>>>>>>>> like you've highlighted well in the document. >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> I've added a few comments around how this may >>>>>>>>>>>>>>> impact the permission questions the engines will be asking, and >>>>>>>>>>>>>>> whether >>>>>>>>>>>>>>> that is the desired behavior. >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> Sung >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> On Wed, Apr 16, 2025 at 7:32 AM Eduard >>>>>>>>>>>>>>> Tudenhöfner <etudenhoef...@apache.org> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> Thanks Walaa for tackling this problem. I've >>>>>>>>>>>>>>> added a few comments to get a better understanding of how this >>>>>>>>>>>>>>> will look >>>>>>>>>>>>>>> like in the actual implementation. >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> Eduard >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> On Tue, Apr 15, 2025 at 7:09 PM Walaa Eldin >>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Everyone, >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> Starting this thread to resume our discussion >>>>>>>>>>>>>>> on how to reference table identifiers from Iceberg metadata, a >>>>>>>>>>>>>>> key aspect >>>>>>>>>>>>>>> of the view specification, particularly in relation to the MV >>>>>>>>>>>>>>> (materialized >>>>>>>>>>>>>>> view) extensions. >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> I had the chance to speak offline with a few >>>>>>>>>>>>>>> community members to better understand how the current spec is >>>>>>>>>>>>>>> being >>>>>>>>>>>>>>> interpreted. Those conversations served as inputs to a new >>>>>>>>>>>>>>> proposal on how >>>>>>>>>>>>>>> table identifier references could be represented in metadata. >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> You can find the proposal here [1]. I look >>>>>>>>>>>>>>> forward to your feedback and working together to move this >>>>>>>>>>>>>>> forward so we >>>>>>>>>>>>>>> can finalize the MV spec as well. >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> [1] >>>>>>>>>>>>>>> https://docs.google.com/document/d/1-I2v_OqBgJi_8HVaeH1u2jowghmXoB8XaJLzPBa_Hg8/edit?tab=t.0 >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>> Walaa. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>