> I think that's the lesser evil compared to Iceberg specifying how engines should resolve identifiers
I think this is also similar to the previous point. It is the other way around. Right now the spec dictates how to resolve (through employing a view-specific `default-catalog` field). The proposal is suggesting to get out of this space and let engines handle it similar to how they handle all identifiers. On Wed, Apr 30, 2025 at 5:07 PM Walaa Eldin Moustafa <wa.moust...@gmail.com> wrote: > > I thought "default-catalog" could be set via the USE command. > > Benny, I think this is a misconception or miscommunication. The USE > command has no impact on the `default-catalog` field. In fact, the > proposal's direction is exactly to establish that USE command should > influence how tables are resolved, same like everywhere else. Right now it > is not the case under the current spec. > > > On Wed, Apr 30, 2025 at 3:17 PM Benny Chow <btc...@gmail.com> wrote: > >> > there is no SQL construct today to explicitly set default-catalog >> >> I thought "default-catalog" could be set via the USE command. >> >> I generally agree with Dan about requiring consistent catalog names. I >> think that's the lesser evil compared to Iceberg specifying how engines >> should resolve identifiers. Another thing to consider is that identifier >> resolution can be very expensive at query validation time if identifiers >> need to be looked up from a bunch of places. Hopefully, it should be >> possible to define a view in such a way that identifiers can be resolved on >> the first try. >> >> Benny >> >> On Tue, Apr 29, 2025 at 10:29 PM Walaa Eldin Moustafa < >> wa.moust...@gmail.com> wrote: >> >>> Hi Rishabh, >>> >>> You're right that the proposal touches on two aspects, and resolution >>> rules are one of them. The other aspect is the proposal's position that >>> table identifiers should be stored in metadata exactly as they appear in >>> the view text (e.g., even if they're two-part or partially qualified), >>> along with their corresponding UUIDs for validation. This applies both to >>> referenced input tables and the storage table identifier in materialized >>> views. >>> >>> We may be able to converge on this storage format even if we haven't yet >>> converged on the resolution fallback rules. I believe both resolution >>> strategies currently being discussed would still lead to storing >>> identifiers in this way. >>> >>> I'm supportive of moving forward with consensus on the identifier >>> storage format. That said, we may continue to run into questions related to >>> resolution during implementation. For example: Should the storage table >>> identifier follow the same default-catalog and default-namespace resolution >>> behavior as other table references? >>> >>> Thanks, >>> Walaa. >>> >>> On Tue, Apr 29, 2025 at 10:07 PM Rishabh Bhatia < >>> bhatiarishab...@gmail.com> wrote: >>> >>>> Hello Walaa, >>>> >>>> Thanks for starting this discussion. >>>> >>>> I think we should decouple at least the MV Spec from the proposal to >>>> change the current behavior of view resolution. >>>> >>>> We can continue having the discussion if the current view spec needs to >>>> be changed or not. Based on the decision at a later point if required we >>>> can update the view resolution rule. >>>> >>>> >>>> Thanks, >>>> Rishabh >>>> >>>> On Mon, Apr 28, 2025 at 3:22 PM Walaa Eldin Moustafa < >>>> wa.moust...@gmail.com> wrote: >>>> >>>>> Correction of typo: both engines seem to set default-catalog to the >>>>> view catalog if it is defined, or to null if the view catalog is not >>>>> defined. >>>>> >>>>> On Mon, Apr 28, 2025 at 3:06 PM Walaa Eldin Moustafa < >>>>> wa.moust...@gmail.com> wrote: >>>>> >>>>>> Hi Dan, >>>>>> >>>>>> Thanks again for your response. >>>>>> >>>>>> I agree that catalog renaming is an environmental event, but it's a >>>>>> real one that happens frequently in practice. >>>>>> Saying that the Iceberg spec cannot accommodate something as common >>>>>> as catalog renaming feels very restrictive, and could make the spec less >>>>>> practical, even unusable, for real-world deployments. >>>>>> I’m sharing this from the perspective of a large data lake >>>>>> environment where views are heavily deployed and operationalized. >>>>>> >>>>>> Further, it's worth noting that the table spec is resilient to >>>>>> catalog renaming, but the view spec is not. If we have an opportunity to >>>>>> make the view spec similarly resilient, I wonder why not? >>>>>> Both specifications are deterministic in their definition, but one is >>>>>> more fragile to environmental changes than the other. Improving >>>>>> resilience >>>>>> does not sacrifice determinism. It simply makes views safer and more >>>>>> portable over time. >>>>>> >>>>>> Separately, given that there is no SQL construct today to explicitly >>>>>> set default-catalog at creation time, what is the intuition behind how >>>>>> engines like Spark and Trino currently assign default-catalog? >>>>>> Today, both engines seem to set default-catalog to null if the view >>>>>> catalog is defined, or to the view catalog if not. >>>>>> What was the intended thought process behind this behavior? >>>>>> >>>>>> Thanks, >>>>>> Walaa >>>>>> >>>>>> On Mon, Apr 28, 2025 at 1:33 PM Daniel Weeks <dwe...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Walaa, >>>>>>> >>>>>>> > tables inside views remain reachable after a catalog rename >>>>>>> >>>>>>> This problem stems from the exact environmental/configuration issue >>>>>>> that we should not be trying to address. I don't think we would expect >>>>>>> references to survive a catalog rename. That's not something covered by >>>>>>> the spec and needs to be handled separately as a platform-level >>>>>>> migration >>>>>>> specific to the affected environment. >>>>>>> >>>>>>> The identifier resolution logic is clear and deterministic. It >>>>>>> should not matter whether an engine resolves and encodes the >>>>>>> default-catalog or leaves it to the resolution rules. >>>>>>> >>>>>>> The issue isn't with how the spec is defined, but rather view >>>>>>> behavior when you start altering the environment around it, which isn't >>>>>>> something we should be trying to define here. >>>>>>> >>>>>>> -Dan >>>>>>> >>>>>>> On Mon, Apr 28, 2025 at 12:17 PM Walaa Eldin Moustafa < >>>>>>> wa.moust...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Dan, >>>>>>>> >>>>>>>> Thanks for chiming in. >>>>>>>> >>>>>>>> I believe the issues we’re seeing now go beyond just catalog naming >>>>>>>> consistency. The behavior around default-catalog itself introduces >>>>>>>> resolution inconsistencies even when catalog names are consistent. >>>>>>>> For example: >>>>>>>> >>>>>>>> * When default-catalog is set to null, tables inside views remain >>>>>>>> reachable after a catalog rename. But if it is set to a non-null value, >>>>>>>> table references will break. >>>>>>>> >>>>>>>> * default-catalog causes table references inside views to be early >>>>>>>> bound (i.e., bound at view creation time, especially when using a >>>>>>>> non-null >>>>>>>> value), while table references inside standalone queries are late bound >>>>>>>> (bound at query time). This creates inconsistencies when resolving the >>>>>>>> same >>>>>>>> table name inside and outside views, even within the same job. >>>>>>>> >>>>>>>> * It causes Spark's and Trino behavior to drift from the spec. >>>>>>>> There is no way to fully align Spark's behavior without making invasive >>>>>>>> changes to the Spark SQL grammar and the View DataSource API >>>>>>>> (specifically >>>>>>>> on the CREATE side). This challenge would extend to other engines too. >>>>>>>> Both >>>>>>>> Spark and Trino set this field based on a heuristic in today's >>>>>>>> implementation. >>>>>>>> >>>>>>>> * With view nesting (views depending on views), these >>>>>>>> inconsistencies amplify further, forcing users and engines to reason >>>>>>>> about >>>>>>>> catalog resolution at every level in the view tree. >>>>>>>> >>>>>>>> * It will be difficult to migrate Hive views to Iceberg with that >>>>>>>> model. Migrated Hive views will have to unfollow that spec. >>>>>>>> >>>>>>>> How would you suggest approaching the engine-level changes required >>>>>>>> to support the current default-catalog field? >>>>>>>> Also, do you believe the Spark and Trino communities would align >>>>>>>> around having table resolution behave inconsistently between queries >>>>>>>> and >>>>>>>> views, or inconsistency between Iceberg and other types of views? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Walaa >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Apr 28, 2025 at 11:34 AM Daniel Weeks <dwe...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I would agree with Jan's summary of why 'default-catalog' was >>>>>>>>> introduced, but I think we need to step back and align on what we are >>>>>>>>> really attempting to support in the spec. >>>>>>>>> >>>>>>>>> The issues we're discussing largely stem from using multiple >>>>>>>>> engines with cross catalog references and configurations where catalog >>>>>>>>> names are not aligned. If we have multiple engines that all have the >>>>>>>>> same >>>>>>>>> catalog names/configurations, the current spec implementation is well >>>>>>>>> defined for table resolution even across catalogs. The >>>>>>>>> 'default-catalog' >>>>>>>>> (and namespace equivalent) was intended to address the resolution >>>>>>>>> within >>>>>>>>> the context of the sql text, not to address catalog/naming >>>>>>>>> inconsistencies. >>>>>>>>> >>>>>>>>> I feel like we're trying to adapt the original intent to address >>>>>>>>> the catalog naming/configuration and would argue that we shouldn't >>>>>>>>> attempt >>>>>>>>> to do that as part of the spec. Inconsistently named catalogs are a >>>>>>>>> reality, but we should consider that a configuration/environmental >>>>>>>>> issue, >>>>>>>>> not something to solve for in the spec. >>>>>>>>> >>>>>>>>> We should support and advocate for consistency in catalog naming >>>>>>>>> and define the spec along those lines. The fact is that with all of >>>>>>>>> the >>>>>>>>> recent work that's gone into making catalogs pluggable, it makes more >>>>>>>>> sense >>>>>>>>> to just register catalog configuration with consistent names (even if >>>>>>>>> you >>>>>>>>> have to duplicate the configuration for supporting existing >>>>>>>>> readers/writers). I think it's better to provide a path toward >>>>>>>>> consistency >>>>>>>>> than to normalize complicated schemes to workaround the issues caused >>>>>>>>> by >>>>>>>>> environmental/configuration inconsistencies. >>>>>>>>> >>>>>>>>> If the goal is to create clever ways to hack the late binding >>>>>>>>> resolution to swap in different catalogs or make references >>>>>>>>> contextual, I >>>>>>>>> feel like that is something we should strongly discourage as it leads >>>>>>>>> to >>>>>>>>> confusion about what is resolved as part of the query. >>>>>>>>> >>>>>>>>> At this point, I don't see a good argument to add >>>>>>>>> additional configuration or change the resolution behaviors. >>>>>>>>> >>>>>>>>> -Dan >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Apr 28, 2025 at 12:40 AM Jan Kaul >>>>>>>>> <jank...@mailbox.org.invalid> wrote: >>>>>>>>> >>>>>>>>>> I think the intention with the "default-catalog" was that every >>>>>>>>>> query engine uses it to store its session default catalog at the >>>>>>>>>> time of >>>>>>>>>> creating the view. This way the view could be reused in another >>>>>>>>>> session. >>>>>>>>>> The idea was not to introduce an additional SQL syntax to set the >>>>>>>>>> default-catalog. >>>>>>>>>> >>>>>>>>>> Generally we have different environments we want to support with >>>>>>>>>> the view spec: >>>>>>>>>> >>>>>>>>>> 1. Consistent catalog naming >>>>>>>>>> >>>>>>>>>> When the environment supports it, using consistent catalog names >>>>>>>>>> can have a great benefit for multi-catalog, multi-engine setups. With >>>>>>>>>> consistent catalog names, using the "default-catalog" field works >>>>>>>>>> without >>>>>>>>>> any issues. >>>>>>>>>> >>>>>>>>>> 2. Inconsistent catalog naming >>>>>>>>>> >>>>>>>>>> This can be the case when different query engines refer to the >>>>>>>>>> same physical catalog by different names. This often happens because >>>>>>>>>> different query engines use different strategies to setup the >>>>>>>>>> catalogs. If >>>>>>>>>> catalogs have inconsistent naming, using the "default-catalog" field >>>>>>>>>> does >>>>>>>>>> not work because it is not guaranteed that the catalog name can be >>>>>>>>>> resolved >>>>>>>>>> with another engine. Using the "view catalog" as a fallback is a >>>>>>>>>> better >>>>>>>>>> solution for this use case, as it avoids catalog names altogether. >>>>>>>>>> It is >>>>>>>>>> however limited to table references in the same catalog. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> What do you think of introducing a view property that specifies >>>>>>>>>> if the "default-catalog" or the "view catalog" should be used? This >>>>>>>>>> way, >>>>>>>>>> you could use the "default-catalog" in environments where you can >>>>>>>>>> guarantee >>>>>>>>>> consistent naming, but you would be able to directly fallback to the >>>>>>>>>> "view-catalog" when you don't have consistent naming. The query >>>>>>>>>> engines >>>>>>>>>> could set the default for this view property at creation time. Spark >>>>>>>>>> for >>>>>>>>>> example could set it to automatically use the "view catalog". >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>> Jan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 4/26/25 05:33, Walaa Eldin Moustafa wrote: >>>>>>>>>> >>>>>>>>>> To help folks catch up on the latest discussions and >>>>>>>>>> interpretation of the spec, I have summarized everything we >>>>>>>>>> discussed so >>>>>>>>>> far at the top of the proposal document (here >>>>>>>>>> <https://docs.google.com/document/d/1-I2v_OqBgJi_8HVaeH1u2jowghmXoB8XaJLzPBa_Hg8/edit?tab=t.0>). >>>>>>>>>> I have slightly updated the proposal to be in sync with the new >>>>>>>>>> interpretation to avoid confusion. In summary: >>>>>>>>>> >>>>>>>>>> * Remove default-catalog and default-namespace fields from the >>>>>>>>>> view spec completely. >>>>>>>>>> >>>>>>>>>> * Hence, we do not attempt to define separate view-level default >>>>>>>>>> catalogs or namespaces. >>>>>>>>>> >>>>>>>>>> Instead: >>>>>>>>>> >>>>>>>>>> * If a table identifier inside a view lacks a catalog qualifier, >>>>>>>>>> engines should resolve it using the current engine catalog at query >>>>>>>>>> time. >>>>>>>>>> >>>>>>>>>> * Reference table identifiers in the metadata exactly as they >>>>>>>>>> appear in the view SQL text. >>>>>>>>>> >>>>>>>>>> * If an identifier lacks the catalog part at creation, it should >>>>>>>>>> still lack a catalog in the stored metadata. >>>>>>>>>> >>>>>>>>>> * Store UUIDs alongside table identifiers whenever possible. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Walaa. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Apr 25, 2025 at 5:18 PM Walaa Eldin Moustafa < >>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks for the contribution Benny! +1 to the confusion the >>>>>>>>>>> fallback creates. Also just to be clear, at this point and after >>>>>>>>>>> clarifying >>>>>>>>>>> the current spec intentions, I am convinced that we should remove >>>>>>>>>>> the >>>>>>>>>>> default catalog and default namespace fields altogether. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Walaa. >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 25, 2025 at 5:13 PM Benny Chow <btc...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I'd like to contribute my opinions on this: >>>>>>>>>>>> >>>>>>>>>>>> - I don't particularly like the current behavior of "default to >>>>>>>>>>>> the view's catalog when default-catalog is not set". >>>>>>>>>>>> Fundamentally, I >>>>>>>>>>>> believe the intent of default-catalog and default-namespace is >>>>>>>>>>>> there to >>>>>>>>>>>> help users write more concise SQL. >>>>>>>>>>>> - spark session catalog is engine specific and I don't think we >>>>>>>>>>>> should design something that says first use this catalog, then that >>>>>>>>>>>> catalog.. or that catalog. For example, resolving identifiers >>>>>>>>>>>> using >>>>>>>>>>>> default-catalog -> view's catalog -> session catalog is not good. >>>>>>>>>>>> - We gotta support non-Iceberg tables otherwise I see no value >>>>>>>>>>>> in putting views in the catalog to share with other engines >>>>>>>>>>>> - Interoperability between different engine types is very hard >>>>>>>>>>>> due to dialect issues... so I think we should focus on supporting >>>>>>>>>>>> different >>>>>>>>>>>> clusters of the same engine type on a shared catalog. For >>>>>>>>>>>> example, AI and >>>>>>>>>>>> BI clusters on Spark sharing the same views in a REST catalog. >>>>>>>>>>>> >>>>>>>>>>>> Coincidentally, I think the ultimate solution is along the >>>>>>>>>>>> lines of something Russell proposed last year: >>>>>>>>>>>> >>>>>>>>>>>> https://lists.apache.org/thread/hoskfx8y3kvrcww52l4w9dxghp3pnlm7 >>>>>>>>>>>> >>>>>>>>>>>> We've been looking at this interoperable identifier problem >>>>>>>>>>>> through the lens of catalog resolution but maybe the right >>>>>>>>>>>> approach is >>>>>>>>>>>> really about templating. >>>>>>>>>>>> >>>>>>>>>>>> I would extend Russell's idea to allow identifiers in a view to >>>>>>>>>>>> span catalogs to support non-Iceberg tables. Also, the >>>>>>>>>>>> default-catalog >>>>>>>>>>>> property could be templated as well. >>>>>>>>>>>> >>>>>>>>>>>> Thoughts? >>>>>>>>>>>> Benny >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Apr 25, 2025 at 4:02 PM Walaa Eldin Moustafa < >>>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks Steven! How do you recommend making Spark >>>>>>>>>>>>> implementation conform to the spec? Do we need Spark SQL >>>>>>>>>>>>> extensions and/or >>>>>>>>>>>>> Spark catalog APIs for that? >>>>>>>>>>>>> >>>>>>>>>>>>> How do you recommend reconciling the inconsistencies I shared >>>>>>>>>>>>> regarding many resolution methods not consistently being followed >>>>>>>>>>>>> in >>>>>>>>>>>>> different scenarios (view vs child table resolution, query vs view >>>>>>>>>>>>> resolution)? Note these occur when the default catalog is set to >>>>>>>>>>>>> a non-null >>>>>>>>>>>>> value. If it helps, I can share concrete examples. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Walaa. >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Apr 25, 2025 at 3:52 PM Steven Wu < >>>>>>>>>>>>> stevenz...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> The core issue is on the fall back behavior when >>>>>>>>>>>>>> `default-catalog` is >>>>>>>>>>>>>> not defined. Current view spec says the fallback should be >>>>>>>>>>>>>> the catalog >>>>>>>>>>>>>> where the view is defined. It doesn't really matter what the >>>>>>>>>>>>>> catalog >>>>>>>>>>>>>> is named (catalogX) by the read engine. >>>>>>>>>>>>>> - If a view refers to the tables in the same catalog, this is >>>>>>>>>>>>>> a >>>>>>>>>>>>>> non-ambiguous and reasonable fallback behavior. >>>>>>>>>>>>>> - If a view refers to tables from another catalog, catalog >>>>>>>>>>>>>> names >>>>>>>>>>>>>> should be included in the reference name already. So no >>>>>>>>>>>>>> ambiguity >>>>>>>>>>>>>> there either. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Potential inconsistent naming of catalog is a separate >>>>>>>>>>>>>> problem, which >>>>>>>>>>>>>> Iceberg view spec probably cannot solve. We can only >>>>>>>>>>>>>> recommend that >>>>>>>>>>>>>> catalog should be named consistently across usage for better >>>>>>>>>>>>>> interoperability on name references. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This proposal is to change the fallback behavior to engine's >>>>>>>>>>>>>> session >>>>>>>>>>>>>> default catalog. I am not sure it is better than the current >>>>>>>>>>>>>> fallback >>>>>>>>>>>>>> behavior. >>>>>>>>>>>>>> >>>>>>>>>>>>>> > Today’s Spark behavior explicitly differs from this idea. >>>>>>>>>>>>>> Spark resolves table identifiers during view creation using the >>>>>>>>>>>>>> session’s >>>>>>>>>>>>>> default catalog, not a supplied `default-catalog`. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I would argue that is a Spark implementation issue for not >>>>>>>>>>>>>> conforming >>>>>>>>>>>>>> to the spec. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Apr 25, 2025 at 1:17 PM Walaa Eldin Moustafa >>>>>>>>>>>>>> <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Hi Jan, >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Thanks again for continuing the discussion. I want to >>>>>>>>>>>>>> highlight a few fundamental issues around the interpretation of >>>>>>>>>>>>>> default-catalog: >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Here is the real catch: >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > * default-catalog cannot logically be defined at view >>>>>>>>>>>>>> creation time. It would be circular: the view needs to exist >>>>>>>>>>>>>> before its >>>>>>>>>>>>>> metadata (and hence default-catalog) can exist. This is visible >>>>>>>>>>>>>> in Spark’s >>>>>>>>>>>>>> implementation, where `default-catalog` is not used. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > * Introducing a creation-time default-catalog setting would >>>>>>>>>>>>>> require extending SQL syntax and engine APIs to promote it to a >>>>>>>>>>>>>> first-class >>>>>>>>>>>>>> view concept. This would be intrusive, non-intuitive, and >>>>>>>>>>>>>> realistically >>>>>>>>>>>>>> very difficult to standardize across engines. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > * Today’s Spark behavior explicitly differs from this idea. >>>>>>>>>>>>>> Spark resolves table identifiers during view creation using the >>>>>>>>>>>>>> session’s >>>>>>>>>>>>>> default catalog, not a supplied `default-catalog`. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > * Hypothetically even if we patched in a creation-time >>>>>>>>>>>>>> default-catalog, it would create an inconsistent binding model >>>>>>>>>>>>>> between >>>>>>>>>>>>>> tables vs views (early vs late), and between tables in views and >>>>>>>>>>>>>> in queries >>>>>>>>>>>>>> (again early vs late). For example, views and tables in queries >>>>>>>>>>>>>> can >>>>>>>>>>>>>> withstand default catalog renames, but tables cannot when they >>>>>>>>>>>>>> are used >>>>>>>>>>>>>> inside views -- it even applies to views inside views, which >>>>>>>>>>>>>> makes this >>>>>>>>>>>>>> very hard to reason about considering nesting. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Thanks, >>>>>>>>>>>>>> > Walaa >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > On Fri, Apr 25, 2025 at 7:00 AM Jan Kaul >>>>>>>>>>>>>> <jank...@mailbox.org.invalid> <jank...@mailbox.org.invalid> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> @Walaa: >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> I would argue that when you run a CREATE VIEW statement >>>>>>>>>>>>>> the query engine knowns which catalog the view is being created >>>>>>>>>>>>>> in. So even >>>>>>>>>>>>>> though we typically use late binding to resolve the view catalog >>>>>>>>>>>>>> at query >>>>>>>>>>>>>> time, it can also be used at creation time. >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> The query engine would need to keep track of the "view >>>>>>>>>>>>>> catalog" where the view is going to be created in. It can use >>>>>>>>>>>>>> that catalog >>>>>>>>>>>>>> to resolve partial table identifiers if "default-catalog" is not >>>>>>>>>>>>>> set. >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> It can lead to some unintuitive behavior, where partial >>>>>>>>>>>>>> identifiers in the view query resolve to a different catalog >>>>>>>>>>>>>> compared to >>>>>>>>>>>>>> using them outside of a view. >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> CREATE VIEW catalogA.sales.monthly_orders AS SELECT * from >>>>>>>>>>>>>> sales.orders; >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> If the session default catalog is not "catalogA", the >>>>>>>>>>>>>> "sales.orders" in the view query would not be the same as just >>>>>>>>>>>>>> referencing >>>>>>>>>>>>>> "sales.orders" in a normal SQL statement. This is because >>>>>>>>>>>>>> without a >>>>>>>>>>>>>> "default-catalog", the catalog name of "sales.orders" would >>>>>>>>>>>>>> default to >>>>>>>>>>>>>> "catalogA", which is the view's catalog. >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> Thanks, >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> Jan >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> On 4/25/25 04:05, Manu Zhang wrote: >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> >>> For example, if we want to validate that the tables >>>>>>>>>>>>>> referenced in the view exist, how can we do that when >>>>>>>>>>>>>> default-catalog isn't >>>>>>>>>>>>>> defined, since the view hasn't been created or loaded yet? >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> I don't think this is related to view spec. How do we >>>>>>>>>>>>>> validate that a table exists without a default catalog, or do we >>>>>>>>>>>>>> always use >>>>>>>>>>>>>> the current session catalog? >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> Thanks, >>>>>>>>>>>>>> >> Manu >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> On Fri, Apr 25, 2025 at 5:59 AM Walaa Eldin Moustafa < >>>>>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> >>> Hi Jan, >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> >>> I think we still share the same understanding. Just to >>>>>>>>>>>>>> clarify: when I referred to late binding as “similar” to the >>>>>>>>>>>>>> proposal, I >>>>>>>>>>>>>> was acknowledging the distinction between view-level and >>>>>>>>>>>>>> table-level >>>>>>>>>>>>>> resolution. But as you noted, both follow a late binding model. >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> >>> That said, this still raises an interesting question and >>>>>>>>>>>>>> a potential gap: if default-catalog is only defined at query >>>>>>>>>>>>>> time, how >>>>>>>>>>>>>> should resolution work during view creation? For example, if we >>>>>>>>>>>>>> want to >>>>>>>>>>>>>> validate that the tables referenced in the view exist, how can >>>>>>>>>>>>>> we do that >>>>>>>>>>>>>> when default-catalog isn't defined, since the view hasn't been >>>>>>>>>>>>>> created or >>>>>>>>>>>>>> loaded yet? >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> >>> Thanks, >>>>>>>>>>>>>> >>> Walaa. >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> >>> On Thu, Apr 24, 2025 at 7:02 AM Jan Kaul >>>>>>>>>>>>>> <jank...@mailbox.org.invalid> <jank...@mailbox.org.invalid> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> Yes, I have the same understanding. The view catalog is >>>>>>>>>>>>>> resolved at query time. >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> As you mentioned before, it's good to distinguish >>>>>>>>>>>>>> between the physical catalog and it's reference used in SQL >>>>>>>>>>>>>> statements. The >>>>>>>>>>>>>> important part is that the physical catalog of the view and the >>>>>>>>>>>>>> tables >>>>>>>>>>>>>> referenced in it's definition stay consistent. You could create >>>>>>>>>>>>>> a view in a >>>>>>>>>>>>>> given physical catalog by referring to it as "catalogA", as in >>>>>>>>>>>>>> your first >>>>>>>>>>>>>> point. If you then, given a different setup, refer to the same >>>>>>>>>>>>>> physical >>>>>>>>>>>>>> catalog as "catalogB" in another session/environment, the >>>>>>>>>>>>>> behavior should >>>>>>>>>>>>>> still work. >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> I would however rephrase your last point. Late binding >>>>>>>>>>>>>> applies to the view catalog name and by extension to all partial >>>>>>>>>>>>>> table >>>>>>>>>>>>>> references when no "default-catalog" is present. Resolving the >>>>>>>>>>>>>> view catalog >>>>>>>>>>>>>> name at query time is not opposed to storing the view metadata >>>>>>>>>>>>>> in a catalog. >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> Or maybe I don't entirely understand what you mean. >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> Thanks >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> Jan >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> On 4/24/25 00:32, Walaa Eldin Moustafa wrote: >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> Hi Jan, >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> > The view is executed when it's being referenced in a >>>>>>>>>>>>>> SQL statement. That statement contains the information for the >>>>>>>>>>>>>> query engine >>>>>>>>>>>>>> to resolve the catalog of the view. >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> If I’m understanding correctly, that means: >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> * If the view is queried as SELECT * FROM >>>>>>>>>>>>>> catalogA.namespace.view, then catalogA is considered the view’s >>>>>>>>>>>>>> catalog. >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> * If the same view is later queried as SELECT * FROM >>>>>>>>>>>>>> catalogB.namespace.view (after renaming catalogA to catalogB, >>>>>>>>>>>>>> and keeping >>>>>>>>>>>>>> everything else the same), then catalogB becomes the view’s >>>>>>>>>>>>>> catalog. >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> Is that interpretation correct? If so, it sounds to me >>>>>>>>>>>>>> like the catalog is resolved at query time, based on how the >>>>>>>>>>>>>> view is >>>>>>>>>>>>>> referenced, not from any stored metadata. That would imply some >>>>>>>>>>>>>> sort of a >>>>>>>>>>>>>> late binding behavior (similar to the proposal), as opposed to >>>>>>>>>>>>>> using some >>>>>>>>>>>>>> catalog that "stores" the view definition. >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> Thanks, >>>>>>>>>>>>>> >>>> Walaa >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> On Tue, Apr 22, 2025 at 11:01 AM Jan Kaul >>>>>>>>>>>>>> <jank...@mailbox.org.invalid> <jank...@mailbox.org.invalid> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> Hi Walaa, >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> Thanks for clarifying the aspects of non-determinism. >>>>>>>>>>>>>> Let me try to address your questions. >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> 1. This is my interpretation of the current spec: The >>>>>>>>>>>>>> view is executed when it's being referenced in a SQL statement. >>>>>>>>>>>>>> That >>>>>>>>>>>>>> statement contains the information for the query engine to >>>>>>>>>>>>>> resolve the >>>>>>>>>>>>>> catalog of the view. The query engine then uses that information >>>>>>>>>>>>>> to fetch >>>>>>>>>>>>>> the view metadata from the catalog. It also needs to temporarily >>>>>>>>>>>>>> keep track >>>>>>>>>>>>>> of which catalog it used to fetch the view metadata. It can then >>>>>>>>>>>>>> use that >>>>>>>>>>>>>> information to resolve the table references in the views SQL >>>>>>>>>>>>>> definition in >>>>>>>>>>>>>> case no default catalog is specified. >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> 2. The important part is that the catalog can be >>>>>>>>>>>>>> referenced at execution time. As long as that's the case I would >>>>>>>>>>>>>> assume the >>>>>>>>>>>>>> view can be created in any catalog. >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> I think your point is really valuable because the >>>>>>>>>>>>>> current specification can lead to some unintuitive behavior. For >>>>>>>>>>>>>> example >>>>>>>>>>>>>> for the following statement: >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> CREATE VIEW catalogA.sales.monthly_orders AS SELECT * >>>>>>>>>>>>>> from sales.orders; >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> If the session default catalog is not "catalogA", the >>>>>>>>>>>>>> "sales.orders" in the view query would not be the same as just >>>>>>>>>>>>>> referencing >>>>>>>>>>>>>> "sales.orders" in a normal SQL statement. This is because >>>>>>>>>>>>>> without a >>>>>>>>>>>>>> "default-catalog", the catalog name of "sales.orders" would >>>>>>>>>>>>>> default to >>>>>>>>>>>>>> "catalogA". >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> However, I like the current design of the view spec, >>>>>>>>>>>>>> because it has the "closure" property. Because of the fact that >>>>>>>>>>>>>> the "view >>>>>>>>>>>>>> catalog" has to be known when executing a view, all the >>>>>>>>>>>>>> information >>>>>>>>>>>>>> required to resolve the table identifiers is contained in the >>>>>>>>>>>>>> view metadata >>>>>>>>>>>>>> (and the "view catalog"). I think that if you make the identifier >>>>>>>>>>>>>> resolution dependent on external parameters, it hinders >>>>>>>>>>>>>> portability. >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> Thanks, >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> Jan >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> On 4/22/25 18:36, Walaa Eldin Moustafa wrote: >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> Hi Jan, >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> Thanks for the thoughtful feedback. >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> I think it’s important we clarify a key point before >>>>>>>>>>>>>> going deeper: >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> Non-determinism is not caused by session fallback >>>>>>>>>>>>>> behavior—it’s a fundamental limitation of using table >>>>>>>>>>>>>> identifiers alone, >>>>>>>>>>>>>> regardless of whether we use the current rule, the proposed >>>>>>>>>>>>>> fallback to the >>>>>>>>>>>>>> session’s default catalog, or even early vs. late binding. >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> The same fully qualified identifier (e.g., >>>>>>>>>>>>>> catalogA.namespace.table) can resolve to different objects >>>>>>>>>>>>>> depending solely >>>>>>>>>>>>>> on engine-specific routing logic or catalog aliases. So >>>>>>>>>>>>>> determinism isn’t >>>>>>>>>>>>>> guaranteed just because an identifier is "fully qualified." The >>>>>>>>>>>>>> only >>>>>>>>>>>>>> reliable anchor for identity is the UUID. That’s why the >>>>>>>>>>>>>> proposed use of >>>>>>>>>>>>>> UUIDs is not just a hardening strategy. It’s the actual fix for >>>>>>>>>>>>>> correctness. >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> To move the conversation forward, could you help >>>>>>>>>>>>>> clarify two things in the context of the current spec: >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> * Where in the metadata is the “view catalog” stored, >>>>>>>>>>>>>> so that an engine knows to fall back to it if default-catalog is >>>>>>>>>>>>>> null? >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> * Are we even allowed to create views in the session's >>>>>>>>>>>>>> default catalog (i.e., without specifying a catalog) in the >>>>>>>>>>>>>> current Iceberg >>>>>>>>>>>>>> spec? >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> These questions are important because if we can’t >>>>>>>>>>>>>> unambiguously recover the "view catalog" from metadata, then >>>>>>>>>>>>>> defaulting to >>>>>>>>>>>>>> it is problematic. And if views can't be created in the default >>>>>>>>>>>>>> catalog, >>>>>>>>>>>>>> then the fallback rule doesn’t generalize. >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> Thanks, >>>>>>>>>>>>>> >>>>> Walaa. >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> On Tue, Apr 22, 2025 at 3:14 AM Jan Kaul >>>>>>>>>>>>>> <jank...@mailbox.org.invalid> <jank...@mailbox.org.invalid> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> Hi Walaa, >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> thank you for your proposal. If I understood >>>>>>>>>>>>>> correctly, you proposal is composed of three parts: >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> - session default catalog as fallback for >>>>>>>>>>>>>> "default-catalog" >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> - session default namespace as fallback for >>>>>>>>>>>>>> "default-namepace" >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> - Late binding + UUID validation >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> I have some comments regarding these points. >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> 1. Session default catalog as fallback for >>>>>>>>>>>>>> "default-catalog" >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> Introducing a behavior that depends on the current >>>>>>>>>>>>>> session setup is in my opinion the definition of >>>>>>>>>>>>>> "non-determinism". You >>>>>>>>>>>>>> could be running the same query-engine and catalog-setup on >>>>>>>>>>>>>> different days, >>>>>>>>>>>>>> with different default session catalogs (which is rather >>>>>>>>>>>>>> common), and would >>>>>>>>>>>>>> be getting different results. >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> Whereas with the current behavior, the view always >>>>>>>>>>>>>> produces the same results. The current behavior has some rough >>>>>>>>>>>>>> edges in >>>>>>>>>>>>>> very niche use cases but I think is solid for most uses cases. >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> 2. Session default namespace as fallback for >>>>>>>>>>>>>> "default-namespace" >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> Similar to the above. >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> 3. Late binding + UUID validation >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> If I understand it correctly, the current >>>>>>>>>>>>>> implementation already uses late binding. >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> Generally, having UUID validation makes the setup more >>>>>>>>>>>>>> robust. Which is great. However, having UUID validation still >>>>>>>>>>>>>> requires us >>>>>>>>>>>>>> to have a portable table identifier specification. Even if we >>>>>>>>>>>>>> have the >>>>>>>>>>>>>> UUIDs of the referenced tables from the view, there simply isn't >>>>>>>>>>>>>> an >>>>>>>>>>>>>> interface that let's us use those UUIDs. The catalog interface >>>>>>>>>>>>>> is defined >>>>>>>>>>>>>> in terms of table identifiers. >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> So we always require a working catalog setup and >>>>>>>>>>>>>> suiting table identifiers to obtain the table metadata. We can >>>>>>>>>>>>>> use the >>>>>>>>>>>>>> UUIDs to verify if we loaded the correct table. But this can >>>>>>>>>>>>>> only be done >>>>>>>>>>>>>> after we used some identifier. Which means there is no way of >>>>>>>>>>>>>> using UUIDs >>>>>>>>>>>>>> without a functioning catalog/identifier setup. >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> In conclusion, I prefer the current behavior for >>>>>>>>>>>>>> "default-catalog" because it is more deterministic in my >>>>>>>>>>>>>> opinion. And I >>>>>>>>>>>>>> think the current spec does a good job for multi-engine table >>>>>>>>>>>>>> identifier >>>>>>>>>>>>>> resolution. I see the UUID validation more of an additional >>>>>>>>>>>>>> hardening >>>>>>>>>>>>>> strategy. >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> Thanks >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> Jan >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> On 4/21/25 17:38, Walaa Eldin Moustafa wrote: >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> Thanks Renjie! >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> The existing spec has some guidance on resolving >>>>>>>>>>>>>> catalogs on the fly already (to address the case of view text >>>>>>>>>>>>>> with table >>>>>>>>>>>>>> identifiers missing the catalog part). The guidance is to use >>>>>>>>>>>>>> the catalog >>>>>>>>>>>>>> where the view is stored. But I find this rule hard to interpret >>>>>>>>>>>>>> or use. >>>>>>>>>>>>>> The catalog itself is a logical construct—such as a federated >>>>>>>>>>>>>> catalog that >>>>>>>>>>>>>> delegates to multiple physical backends (e.g., HMS and REST). In >>>>>>>>>>>>>> such >>>>>>>>>>>>>> cases, the catalog (e.g., `my_catalog` in >>>>>>>>>>>>>> `my_catalog.namespace1.table1`) >>>>>>>>>>>>>> doesn’t physically store the tables; it only routes requests to >>>>>>>>>>>>>> underlying >>>>>>>>>>>>>> stores. Therefore, defaulting identifier resolution based on the >>>>>>>>>>>>>> catalog >>>>>>>>>>>>>> where the view is "stored" doesn’t align with how catalogs >>>>>>>>>>>>>> actually behave >>>>>>>>>>>>>> in practice. >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> Thanks, >>>>>>>>>>>>>> >>>>>> Walaa. >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> On Sun, Apr 20, 2025 at 11:17 PM Renjie Liu < >>>>>>>>>>>>>> liurenjie2...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>> Hi, Walaa: >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>> Thanks for the proposal. >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>> I've reviewed the doc, but in general I have some >>>>>>>>>>>>>> concerns with resolving catalog names on the fly with query >>>>>>>>>>>>>> engine defined >>>>>>>>>>>>>> catalog names. This introduces some flexibility at first glance, >>>>>>>>>>>>>> but also >>>>>>>>>>>>>> makes misconfiguration difficult to explain. >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>> But I agree with one part that we should store >>>>>>>>>>>>>> resolved table uuid in view metadata, as table/view renaming may >>>>>>>>>>>>>> introduce >>>>>>>>>>>>>> errors that's difficult to understand for user. >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>> On Sat, Apr 19, 2025 at 3:02 AM Walaa Eldin Moustafa < >>>>>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> Hi Everyone, >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> Looking forward to keeping up the momentum and >>>>>>>>>>>>>> closing out the MV spec as well. I’m hoping we can proceed to a >>>>>>>>>>>>>> vote next >>>>>>>>>>>>>> week. >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> Here is a summary in case that helps. The proposal >>>>>>>>>>>>>> outlines a strategy for handling table identifiers in Iceberg >>>>>>>>>>>>>> view >>>>>>>>>>>>>> metadata, with the goal of ensuring correctness, portability, >>>>>>>>>>>>>> and engine >>>>>>>>>>>>>> compatibility. It recommends resolving table identifiers at read >>>>>>>>>>>>>> time (late >>>>>>>>>>>>>> binding) rather than creation time, and introduces UUID-based >>>>>>>>>>>>>> validation to >>>>>>>>>>>>>> maintain identity guarantees across engines, or sessions. It >>>>>>>>>>>>>> also revises >>>>>>>>>>>>>> how default-catalog and default-namespace are handled >>>>>>>>>>>>>> (defaulting both to >>>>>>>>>>>>>> the session context if not explicitly set) to better align with >>>>>>>>>>>>>> engine >>>>>>>>>>>>>> behavior and improve cross-engine interoperability. >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> Please let me know your thoughts. >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>> Walaa. >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> On Wed, Apr 16, 2025 at 2:03 PM Walaa Eldin Moustafa >>>>>>>>>>>>>> <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> Thanks Eduard and Sung! I have addressed the >>>>>>>>>>>>>> comments. >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> One key point to keep in mind is that catalog names >>>>>>>>>>>>>> in the spec refer to logical catalogs—i.e., the first part of a >>>>>>>>>>>>>> three-part >>>>>>>>>>>>>> identifier. These correspond to Spark's DataSourceV2 catalogs, >>>>>>>>>>>>>> Trino >>>>>>>>>>>>>> connectors, and similar constructs. This is a level of >>>>>>>>>>>>>> abstraction above >>>>>>>>>>>>>> physical catalogs, which are not referenced or used in the view >>>>>>>>>>>>>> spec. The >>>>>>>>>>>>>> reason is that table identifiers in the view definition/text >>>>>>>>>>>>>> itself refer >>>>>>>>>>>>>> to logical catalogs, not physical ones (since they interface >>>>>>>>>>>>>> directly with >>>>>>>>>>>>>> the engine and not a specific metastore). >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>> Walaa. >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> On Wed, Apr 16, 2025 at 6:15 AM Sung Yun < >>>>>>>>>>>>>> sungwy...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> Thank you Walaa for the proposal. I think view >>>>>>>>>>>>>> portability is a very important topic for us to continue >>>>>>>>>>>>>> discussing as it >>>>>>>>>>>>>> relies on many assumptions within the data ecosystem for it to >>>>>>>>>>>>>> function >>>>>>>>>>>>>> like you've highlighted well in the document. >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> I've added a few comments around how this may >>>>>>>>>>>>>> impact the permission questions the engines will be asking, and >>>>>>>>>>>>>> whether >>>>>>>>>>>>>> that is the desired behavior. >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> Sung >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> On Wed, Apr 16, 2025 at 7:32 AM Eduard Tudenhöfner >>>>>>>>>>>>>> <etudenhoef...@apache.org> wrote: >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> Thanks Walaa for tackling this problem. I've >>>>>>>>>>>>>> added a few comments to get a better understanding of how this >>>>>>>>>>>>>> will look >>>>>>>>>>>>>> like in the actual implementation. >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> Eduard >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> On Tue, Apr 15, 2025 at 7:09 PM Walaa Eldin >>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> Hi Everyone, >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> Starting this thread to resume our discussion on >>>>>>>>>>>>>> how to reference table identifiers from Iceberg metadata, a key >>>>>>>>>>>>>> aspect of >>>>>>>>>>>>>> the view specification, particularly in relation to the MV >>>>>>>>>>>>>> (materialized >>>>>>>>>>>>>> view) extensions. >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> I had the chance to speak offline with a few >>>>>>>>>>>>>> community members to better understand how the current spec is >>>>>>>>>>>>>> being >>>>>>>>>>>>>> interpreted. Those conversations served as inputs to a new >>>>>>>>>>>>>> proposal on how >>>>>>>>>>>>>> table identifier references could be represented in metadata. >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> You can find the proposal here [1]. I look >>>>>>>>>>>>>> forward to your feedback and working together to move this >>>>>>>>>>>>>> forward so we >>>>>>>>>>>>>> can finalize the MV spec as well. >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> [1] >>>>>>>>>>>>>> https://docs.google.com/document/d/1-I2v_OqBgJi_8HVaeH1u2jowghmXoB8XaJLzPBa_Hg8/edit?tab=t.0 >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>> Walaa. >>>>>>>>>>>>>> >>>>>>>>>>>>>