In Spark, I believe that the USE commands sets the current catalog and
namespace.  This affects both where the view is created and how unqualified
table identifiers are resolved.  I also don't see an issue with saving the
current catalog and namespace into the view metadata's default-catalog and
default-namespace fields.

On Wed, Apr 30, 2025 at 5:12 PM Walaa Eldin Moustafa <wa.moust...@gmail.com>
wrote:

> > I think that's the lesser evil compared to Iceberg specifying how
> engines should resolve identifiers
>
> I think this is also similar to the previous point. It is the other way
> around. Right now the spec dictates how to resolve (through employing a
> view-specific `default-catalog` field). The proposal is suggesting to get
> out of this space and let engines handle it similar to how they handle all
> identifiers.
>
> On Wed, Apr 30, 2025 at 5:07 PM Walaa Eldin Moustafa <
> wa.moust...@gmail.com> wrote:
>
>> > I thought "default-catalog" could be set via the USE command.
>>
>> Benny, I think this is a misconception or miscommunication. The USE
>> command has no impact on the `default-catalog` field. In fact, the
>> proposal's direction is exactly to establish that USE command should
>> influence how tables are resolved, same like everywhere else. Right now it
>> is not the case under the current spec.
>>
>>
>> On Wed, Apr 30, 2025 at 3:17 PM Benny Chow <btc...@gmail.com> wrote:
>>
>>> > there is no SQL construct today to explicitly set default-catalog
>>>
>>> I thought "default-catalog" could be set via the USE command.
>>>
>>> I generally agree with Dan about requiring consistent catalog names.  I
>>> think that's the lesser evil compared to Iceberg specifying how engines
>>> should resolve identifiers.  Another thing to consider is that identifier
>>> resolution can be very expensive at query validation time if identifiers
>>> need to be looked up from a bunch of places.  Hopefully, it should be
>>> possible to define a view in such a way that identifiers can be resolved on
>>> the first try.
>>>
>>> Benny
>>>
>>> On Tue, Apr 29, 2025 at 10:29 PM Walaa Eldin Moustafa <
>>> wa.moust...@gmail.com> wrote:
>>>
>>>> Hi Rishabh,
>>>>
>>>> You're right that the proposal touches on two aspects, and resolution
>>>> rules are one of them. The other aspect is the proposal's position that
>>>> table identifiers should be stored in metadata exactly as they appear in
>>>> the view text (e.g., even if they're two-part or partially qualified),
>>>> along with their corresponding UUIDs for validation. This applies both to
>>>> referenced input tables and the storage table identifier in materialized
>>>> views.
>>>>
>>>> We may be able to converge on this storage format even if we haven't
>>>> yet converged on the resolution fallback rules. I believe both resolution
>>>> strategies currently being discussed would still lead to storing
>>>> identifiers in this way.
>>>>
>>>> I'm supportive of moving forward with consensus on the identifier
>>>> storage format. That said, we may continue to run into questions related to
>>>> resolution during implementation. For example: Should the storage table
>>>> identifier follow the same default-catalog and default-namespace resolution
>>>> behavior as other table references?
>>>>
>>>> Thanks,
>>>> Walaa.
>>>>
>>>> On Tue, Apr 29, 2025 at 10:07 PM Rishabh Bhatia <
>>>> bhatiarishab...@gmail.com> wrote:
>>>>
>>>>> Hello Walaa,
>>>>>
>>>>> Thanks for starting this discussion.
>>>>>
>>>>> I think we should decouple at least the MV Spec from the proposal to
>>>>> change the current behavior of view resolution.
>>>>>
>>>>> We can continue having the discussion if the current view spec needs
>>>>> to be changed or not. Based on the decision at a later point if required 
>>>>> we
>>>>> can update the view resolution rule.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Rishabh
>>>>>
>>>>> On Mon, Apr 28, 2025 at 3:22 PM Walaa Eldin Moustafa <
>>>>> wa.moust...@gmail.com> wrote:
>>>>>
>>>>>> Correction of typo: both engines seem to set default-catalog to the
>>>>>> view catalog if it is defined, or to null if the view catalog is not
>>>>>> defined.
>>>>>>
>>>>>> On Mon, Apr 28, 2025 at 3:06 PM Walaa Eldin Moustafa <
>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Dan,
>>>>>>>
>>>>>>> Thanks again for your response.
>>>>>>>
>>>>>>> I agree that catalog renaming is an environmental event, but it's a
>>>>>>> real one that happens frequently in practice.
>>>>>>> Saying that the Iceberg spec cannot accommodate something as common
>>>>>>> as catalog renaming feels very restrictive, and could make the spec less
>>>>>>> practical, even unusable, for real-world deployments.
>>>>>>> I’m sharing this from the perspective of a large data lake
>>>>>>> environment where views are heavily deployed and operationalized.
>>>>>>>
>>>>>>> Further, it's worth noting that the table spec is resilient to
>>>>>>> catalog renaming, but the view spec is not. If we have an opportunity to
>>>>>>> make the view spec similarly resilient, I wonder why not?
>>>>>>> Both specifications are deterministic in their definition, but one
>>>>>>> is more fragile to environmental changes than the other. Improving
>>>>>>> resilience does not sacrifice determinism. It simply makes views safer 
>>>>>>> and
>>>>>>> more portable over time.
>>>>>>>
>>>>>>> Separately, given that there is no SQL construct today to explicitly
>>>>>>> set default-catalog at creation time, what is the intuition behind how
>>>>>>> engines like Spark and Trino currently assign default-catalog?
>>>>>>> Today, both engines seem to set default-catalog to null if the view
>>>>>>> catalog is defined, or to the view catalog if not.
>>>>>>> What was the intended thought process behind this behavior?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Walaa
>>>>>>>
>>>>>>> On Mon, Apr 28, 2025 at 1:33 PM Daniel Weeks <dwe...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Walaa,
>>>>>>>>
>>>>>>>> > tables inside views remain reachable after a catalog rename
>>>>>>>>
>>>>>>>> This problem stems from the exact environmental/configuration issue
>>>>>>>> that we should not be trying to address.  I don't think we would expect
>>>>>>>> references to survive a catalog rename.  That's not something covered 
>>>>>>>> by
>>>>>>>> the spec and needs to be handled separately as a platform-level 
>>>>>>>> migration
>>>>>>>> specific to the affected environment.
>>>>>>>>
>>>>>>>> The identifier resolution logic is clear and deterministic.  It
>>>>>>>> should not matter whether an engine resolves and encodes the
>>>>>>>> default-catalog or leaves it to the resolution rules.
>>>>>>>>
>>>>>>>> The issue isn't with how the spec is defined, but rather view
>>>>>>>> behavior when you start altering the environment around it, which isn't
>>>>>>>> something we should be trying to define here.
>>>>>>>>
>>>>>>>> -Dan
>>>>>>>>
>>>>>>>> On Mon, Apr 28, 2025 at 12:17 PM Walaa Eldin Moustafa <
>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Dan,
>>>>>>>>>
>>>>>>>>> Thanks for chiming in.
>>>>>>>>>
>>>>>>>>> I believe the issues we’re seeing now go beyond just catalog
>>>>>>>>> naming consistency. The behavior around default-catalog itself 
>>>>>>>>> introduces
>>>>>>>>> resolution inconsistencies even when catalog names are consistent.
>>>>>>>>> For example:
>>>>>>>>>
>>>>>>>>> * When default-catalog is set to null, tables inside views remain
>>>>>>>>> reachable after a catalog rename. But if it is set to a non-null 
>>>>>>>>> value,
>>>>>>>>> table references will break.
>>>>>>>>>
>>>>>>>>> * default-catalog causes table references inside views to be early
>>>>>>>>> bound (i.e., bound at view creation time, especially when using a 
>>>>>>>>> non-null
>>>>>>>>> value), while table references inside standalone queries are late 
>>>>>>>>> bound
>>>>>>>>> (bound at query time). This creates inconsistencies when resolving 
>>>>>>>>> the same
>>>>>>>>> table name inside and outside views, even within the same job.
>>>>>>>>>
>>>>>>>>> * It causes Spark's and Trino behavior to drift from the spec.
>>>>>>>>> There is no way to fully align Spark's behavior without making 
>>>>>>>>> invasive
>>>>>>>>> changes to the Spark SQL grammar and the View DataSource API 
>>>>>>>>> (specifically
>>>>>>>>> on the CREATE side). This challenge would extend to other engines 
>>>>>>>>> too. Both
>>>>>>>>> Spark and Trino set this field based on a heuristic in today's
>>>>>>>>> implementation.
>>>>>>>>>
>>>>>>>>> * With view nesting (views depending on views), these
>>>>>>>>> inconsistencies amplify further, forcing users and engines to reason 
>>>>>>>>> about
>>>>>>>>> catalog resolution at every level in the view tree.
>>>>>>>>>
>>>>>>>>> * It will be difficult to migrate Hive views to Iceberg with that
>>>>>>>>> model. Migrated Hive views will have to unfollow that spec.
>>>>>>>>>
>>>>>>>>> How would you suggest approaching the engine-level changes
>>>>>>>>> required to support the current default-catalog field?
>>>>>>>>> Also, do you believe the Spark and Trino communities would align
>>>>>>>>> around having table resolution behave inconsistently between queries 
>>>>>>>>> and
>>>>>>>>> views, or inconsistency between Iceberg and other types of views?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Walaa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Apr 28, 2025 at 11:34 AM Daniel Weeks <dwe...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I would agree with Jan's summary of why 'default-catalog' was
>>>>>>>>>> introduced, but I think we need to step back and align on what we are
>>>>>>>>>> really attempting to support in the spec.
>>>>>>>>>>
>>>>>>>>>> The issues we're discussing largely stem from using multiple
>>>>>>>>>> engines with cross catalog references and configurations where 
>>>>>>>>>> catalog
>>>>>>>>>> names are not aligned.  If we have multiple engines that all have 
>>>>>>>>>> the same
>>>>>>>>>> catalog names/configurations, the current spec implementation is well
>>>>>>>>>> defined for table resolution even across catalogs.  The 
>>>>>>>>>> 'default-catalog'
>>>>>>>>>> (and namespace equivalent) was intended to address the resolution 
>>>>>>>>>> within
>>>>>>>>>> the context of the sql text, not to address catalog/naming 
>>>>>>>>>> inconsistencies.
>>>>>>>>>>
>>>>>>>>>> I feel like we're trying to adapt the original intent to address
>>>>>>>>>> the catalog naming/configuration and would argue that we shouldn't 
>>>>>>>>>> attempt
>>>>>>>>>> to do that as part of the spec.  Inconsistently named catalogs are a
>>>>>>>>>> reality, but we should consider that a configuration/environmental 
>>>>>>>>>> issue,
>>>>>>>>>> not something to solve for in the spec.
>>>>>>>>>>
>>>>>>>>>> We should support and advocate for consistency in catalog naming
>>>>>>>>>> and define the spec along those lines.  The fact is that with all of 
>>>>>>>>>> the
>>>>>>>>>> recent work that's gone into making catalogs pluggable, it makes 
>>>>>>>>>> more sense
>>>>>>>>>> to just register catalog configuration with consistent names (even 
>>>>>>>>>> if you
>>>>>>>>>> have to duplicate the configuration for supporting existing
>>>>>>>>>> readers/writers).  I think it's better to provide a path toward 
>>>>>>>>>> consistency
>>>>>>>>>> than to normalize complicated schemes to workaround the issues 
>>>>>>>>>> caused by
>>>>>>>>>> environmental/configuration inconsistencies.
>>>>>>>>>>
>>>>>>>>>> If the goal is to create clever ways to hack the late binding
>>>>>>>>>> resolution to swap in different catalogs or make references 
>>>>>>>>>> contextual, I
>>>>>>>>>> feel like that is something we should strongly discourage as it 
>>>>>>>>>> leads to
>>>>>>>>>> confusion about what is resolved as part of the query.
>>>>>>>>>>
>>>>>>>>>> At this point, I don't see a good argument to add
>>>>>>>>>> additional configuration or change the resolution behaviors.
>>>>>>>>>>
>>>>>>>>>> -Dan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Apr 28, 2025 at 12:40 AM Jan Kaul
>>>>>>>>>> <jank...@mailbox.org.invalid> wrote:
>>>>>>>>>>
>>>>>>>>>>> I think the intention with the "default-catalog" was that every
>>>>>>>>>>> query engine uses it to store its session default catalog at the 
>>>>>>>>>>> time of
>>>>>>>>>>> creating the view. This way the view could be reused in another 
>>>>>>>>>>> session.
>>>>>>>>>>> The idea was not to introduce an additional SQL syntax to set the
>>>>>>>>>>> default-catalog.
>>>>>>>>>>>
>>>>>>>>>>> Generally we have different environments we want to support with
>>>>>>>>>>> the view spec:
>>>>>>>>>>>
>>>>>>>>>>> 1. Consistent catalog naming
>>>>>>>>>>>
>>>>>>>>>>> When the environment supports it, using consistent catalog names
>>>>>>>>>>> can have a great benefit for multi-catalog, multi-engine setups. 
>>>>>>>>>>> With
>>>>>>>>>>> consistent catalog names, using the "default-catalog" field works 
>>>>>>>>>>> without
>>>>>>>>>>> any issues.
>>>>>>>>>>>
>>>>>>>>>>> 2. Inconsistent catalog naming
>>>>>>>>>>>
>>>>>>>>>>> This can be the case when different query engines refer to the
>>>>>>>>>>> same physical catalog by different names. This often happens because
>>>>>>>>>>> different query engines use different strategies to setup the 
>>>>>>>>>>> catalogs. If
>>>>>>>>>>> catalogs have inconsistent naming, using the "default-catalog" 
>>>>>>>>>>> field does
>>>>>>>>>>> not work because it is not guaranteed that the catalog name can be 
>>>>>>>>>>> resolved
>>>>>>>>>>> with another engine. Using the "view catalog" as a fallback is a 
>>>>>>>>>>> better
>>>>>>>>>>> solution for this use case, as it avoids catalog names altogether. 
>>>>>>>>>>> It is
>>>>>>>>>>> however limited to table references in the same catalog.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> What do you think of introducing a view property that specifies
>>>>>>>>>>> if the "default-catalog" or the "view catalog" should be used? This 
>>>>>>>>>>> way,
>>>>>>>>>>> you could use the "default-catalog" in environments where you can 
>>>>>>>>>>> guarantee
>>>>>>>>>>> consistent naming, but you would be able to directly fallback to the
>>>>>>>>>>> "view-catalog" when you don't have consistent naming. The query 
>>>>>>>>>>> engines
>>>>>>>>>>> could set the default for this view property at creation time. 
>>>>>>>>>>> Spark for
>>>>>>>>>>> example could set it to automatically use the "view catalog".
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>> Jan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 4/26/25 05:33, Walaa Eldin Moustafa wrote:
>>>>>>>>>>>
>>>>>>>>>>> To help folks catch up on the latest discussions and
>>>>>>>>>>> interpretation of the spec, I have summarized everything we 
>>>>>>>>>>> discussed so
>>>>>>>>>>> far at the top of the proposal document (here
>>>>>>>>>>> <https://docs.google.com/document/d/1-I2v_OqBgJi_8HVaeH1u2jowghmXoB8XaJLzPBa_Hg8/edit?tab=t.0>).
>>>>>>>>>>> I have slightly updated the proposal to be in sync with the new
>>>>>>>>>>> interpretation to avoid confusion. In summary:
>>>>>>>>>>>
>>>>>>>>>>> * Remove default-catalog and default-namespace fields from the
>>>>>>>>>>> view spec completely.
>>>>>>>>>>>
>>>>>>>>>>> * Hence, we do not attempt to define separate view-level default
>>>>>>>>>>> catalogs or namespaces.
>>>>>>>>>>>
>>>>>>>>>>> Instead:
>>>>>>>>>>>
>>>>>>>>>>> * If a table identifier inside a view lacks a catalog qualifier,
>>>>>>>>>>> engines should resolve it using the current engine catalog at query 
>>>>>>>>>>> time.
>>>>>>>>>>>
>>>>>>>>>>> * Reference table identifiers in the metadata exactly as they
>>>>>>>>>>> appear in the view SQL text.
>>>>>>>>>>>
>>>>>>>>>>> * If an identifier lacks the catalog part at creation, it should
>>>>>>>>>>> still lack a catalog in the stored metadata.
>>>>>>>>>>>
>>>>>>>>>>> * Store UUIDs alongside table identifiers whenever possible.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Walaa.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Apr 25, 2025 at 5:18 PM Walaa Eldin Moustafa <
>>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the contribution Benny! +1 to the confusion the
>>>>>>>>>>>> fallback creates. Also just to be clear, at this point and after 
>>>>>>>>>>>> clarifying
>>>>>>>>>>>> the current spec intentions, I am convinced that we should remove 
>>>>>>>>>>>> the
>>>>>>>>>>>> default catalog and default namespace fields altogether.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Walaa.
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Apr 25, 2025 at 5:13 PM Benny Chow <btc...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I'd like to contribute my opinions on this:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - I don't particularly like the current behavior of "default
>>>>>>>>>>>>> to the view's catalog when default-catalog is not set".  
>>>>>>>>>>>>> Fundamentally, I
>>>>>>>>>>>>> believe the intent of default-catalog and default-namespace is 
>>>>>>>>>>>>> there to
>>>>>>>>>>>>> help users write more concise SQL.
>>>>>>>>>>>>> - spark session catalog is engine specific and I don't think
>>>>>>>>>>>>> we should design something that says first use this catalog, then 
>>>>>>>>>>>>> that
>>>>>>>>>>>>> catalog.. or that catalog.  For example, resolving identifiers 
>>>>>>>>>>>>> using
>>>>>>>>>>>>> default-catalog -> view's catalog -> session catalog is not good.
>>>>>>>>>>>>> - We gotta support non-Iceberg tables otherwise I see no value
>>>>>>>>>>>>> in putting views in the catalog to share with other engines
>>>>>>>>>>>>> - Interoperability between different engine types is very hard
>>>>>>>>>>>>> due to dialect issues... so I think we should focus on supporting 
>>>>>>>>>>>>> different
>>>>>>>>>>>>> clusters of the same engine type on a shared catalog.  For 
>>>>>>>>>>>>> example, AI and
>>>>>>>>>>>>> BI clusters on Spark sharing the same views in a REST catalog.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Coincidentally, I think the ultimate solution is along the
>>>>>>>>>>>>> lines of something Russell proposed last year:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://lists.apache.org/thread/hoskfx8y3kvrcww52l4w9dxghp3pnlm7
>>>>>>>>>>>>>
>>>>>>>>>>>>> We've been looking at this interoperable identifier problem
>>>>>>>>>>>>> through the lens of catalog resolution but maybe the right 
>>>>>>>>>>>>> approach is
>>>>>>>>>>>>> really about templating.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would extend Russell's idea to allow identifiers in a view
>>>>>>>>>>>>> to span catalogs to support non-Iceberg tables.   Also, the 
>>>>>>>>>>>>> default-catalog
>>>>>>>>>>>>> property could be templated as well.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>> Benny
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Apr 25, 2025 at 4:02 PM Walaa Eldin Moustafa <
>>>>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks Steven! How do you recommend making Spark
>>>>>>>>>>>>>> implementation conform to the spec? Do we need Spark SQL 
>>>>>>>>>>>>>> extensions and/or
>>>>>>>>>>>>>> Spark catalog APIs for that?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> How do you recommend reconciling the inconsistencies I shared
>>>>>>>>>>>>>> regarding many resolution methods not consistently being 
>>>>>>>>>>>>>> followed in
>>>>>>>>>>>>>> different scenarios (view vs child table resolution, query vs 
>>>>>>>>>>>>>> view
>>>>>>>>>>>>>> resolution)? Note these occur when the default catalog is set to 
>>>>>>>>>>>>>> a non-null
>>>>>>>>>>>>>> value. If it helps, I can share concrete examples.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Apr 25, 2025 at 3:52 PM Steven Wu <
>>>>>>>>>>>>>> stevenz...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The core issue is on the fall back behavior when
>>>>>>>>>>>>>>> `default-catalog` is
>>>>>>>>>>>>>>> not defined. Current view spec says the fallback should be
>>>>>>>>>>>>>>> the catalog
>>>>>>>>>>>>>>> where the view is defined. It doesn't really matter what the
>>>>>>>>>>>>>>> catalog
>>>>>>>>>>>>>>> is named (catalogX) by the read engine.
>>>>>>>>>>>>>>> - If a view refers to the tables in the same catalog, this
>>>>>>>>>>>>>>> is a
>>>>>>>>>>>>>>> non-ambiguous and reasonable fallback behavior.
>>>>>>>>>>>>>>> - If a view refers to tables from another catalog, catalog
>>>>>>>>>>>>>>> names
>>>>>>>>>>>>>>> should be included in the reference name already. So no
>>>>>>>>>>>>>>> ambiguity
>>>>>>>>>>>>>>> there either.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Potential inconsistent naming of catalog is a separate
>>>>>>>>>>>>>>> problem, which
>>>>>>>>>>>>>>> Iceberg view spec probably cannot solve. We can only
>>>>>>>>>>>>>>> recommend that
>>>>>>>>>>>>>>> catalog should be named consistently across usage for better
>>>>>>>>>>>>>>> interoperability on name references.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This proposal is to change the fallback behavior to engine's
>>>>>>>>>>>>>>> session
>>>>>>>>>>>>>>> default catalog. I am not sure it is better than the current
>>>>>>>>>>>>>>> fallback
>>>>>>>>>>>>>>> behavior.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > Today’s Spark behavior explicitly differs from this idea.
>>>>>>>>>>>>>>> Spark resolves table identifiers during view creation using the 
>>>>>>>>>>>>>>> session’s
>>>>>>>>>>>>>>> default catalog, not a supplied `default-catalog`.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I would argue that is a Spark implementation issue for not
>>>>>>>>>>>>>>> conforming
>>>>>>>>>>>>>>> to the spec.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Apr 25, 2025 at 1:17 PM Walaa Eldin Moustafa
>>>>>>>>>>>>>>> <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Hi Jan,
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Thanks again for continuing the discussion. I want to
>>>>>>>>>>>>>>> highlight a few fundamental issues around the interpretation of
>>>>>>>>>>>>>>> default-catalog:
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Here is the real catch:
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > * default-catalog cannot logically be defined at view
>>>>>>>>>>>>>>> creation time. It would be circular: the view needs to exist 
>>>>>>>>>>>>>>> before its
>>>>>>>>>>>>>>> metadata (and hence default-catalog) can exist. This is visible 
>>>>>>>>>>>>>>> in Spark’s
>>>>>>>>>>>>>>> implementation, where `default-catalog` is not used.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > * Introducing a creation-time default-catalog setting
>>>>>>>>>>>>>>> would require extending SQL syntax and engine APIs to promote 
>>>>>>>>>>>>>>> it to a
>>>>>>>>>>>>>>> first-class view concept. This would be intrusive, 
>>>>>>>>>>>>>>> non-intuitive, and
>>>>>>>>>>>>>>> realistically very difficult to standardize across engines.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > * Today’s Spark behavior explicitly differs from this
>>>>>>>>>>>>>>> idea. Spark resolves table identifiers during view creation 
>>>>>>>>>>>>>>> using the
>>>>>>>>>>>>>>> session’s default catalog, not a supplied `default-catalog`.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > * Hypothetically even if we patched in a creation-time
>>>>>>>>>>>>>>> default-catalog, it would create an inconsistent binding model 
>>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>> tables vs views (early vs late), and between tables in views 
>>>>>>>>>>>>>>> and in queries
>>>>>>>>>>>>>>> (again early vs late). For example, views and tables in queries 
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> withstand default catalog renames, but tables cannot when they 
>>>>>>>>>>>>>>> are used
>>>>>>>>>>>>>>> inside views -- it even applies to views inside views, which 
>>>>>>>>>>>>>>> makes this
>>>>>>>>>>>>>>> very hard to reason about considering nesting.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Thanks,
>>>>>>>>>>>>>>> > Walaa
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > On Fri, Apr 25, 2025 at 7:00 AM Jan Kaul
>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> <jank...@mailbox.org.invalid>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> @Walaa:
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> I would argue that when you run a CREATE VIEW statement
>>>>>>>>>>>>>>> the query engine knowns which catalog the view is being created 
>>>>>>>>>>>>>>> in. So even
>>>>>>>>>>>>>>> though we typically use late binding to resolve the view 
>>>>>>>>>>>>>>> catalog at query
>>>>>>>>>>>>>>> time, it can also be used at creation time.
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> The query engine would need to keep track of the "view
>>>>>>>>>>>>>>> catalog" where the view is going to be created in. It can use 
>>>>>>>>>>>>>>> that catalog
>>>>>>>>>>>>>>> to resolve partial table identifiers if "default-catalog" is 
>>>>>>>>>>>>>>> not set.
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> It can lead to some unintuitive behavior, where partial
>>>>>>>>>>>>>>> identifiers in the view query resolve to a different catalog 
>>>>>>>>>>>>>>> compared to
>>>>>>>>>>>>>>> using them outside of a view.
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> CREATE VIEW catalogA.sales.monthly_orders AS SELECT *
>>>>>>>>>>>>>>> from sales.orders;
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> If the session default catalog is not "catalogA", the
>>>>>>>>>>>>>>> "sales.orders" in the view query would not be the same as just 
>>>>>>>>>>>>>>> referencing
>>>>>>>>>>>>>>> "sales.orders" in a normal SQL statement. This is because 
>>>>>>>>>>>>>>> without a
>>>>>>>>>>>>>>> "default-catalog", the catalog name of "sales.orders" would 
>>>>>>>>>>>>>>> default to
>>>>>>>>>>>>>>> "catalogA", which is the view's catalog.
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> Thanks,
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> Jan
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> On 4/25/25 04:05, Manu Zhang wrote:
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> For example, if we want to validate that the tables
>>>>>>>>>>>>>>> referenced in the view exist, how can we do that when 
>>>>>>>>>>>>>>> default-catalog isn't
>>>>>>>>>>>>>>> defined, since the view hasn't been created or loaded yet?
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> I don't think this is related to view spec. How do we
>>>>>>>>>>>>>>> validate that a table exists without a default catalog, or do 
>>>>>>>>>>>>>>> we always use
>>>>>>>>>>>>>>> the current session catalog?
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> Thanks,
>>>>>>>>>>>>>>> >> Manu
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> On Fri, Apr 25, 2025 at 5:59 AM Walaa Eldin Moustafa <
>>>>>>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> Hi Jan,
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> I think we still share the same understanding. Just to
>>>>>>>>>>>>>>> clarify: when I referred to late binding as “similar” to the 
>>>>>>>>>>>>>>> proposal, I
>>>>>>>>>>>>>>> was acknowledging the distinction between view-level and 
>>>>>>>>>>>>>>> table-level
>>>>>>>>>>>>>>> resolution. But as you noted, both follow a late binding model.
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> That said, this still raises an interesting question and
>>>>>>>>>>>>>>> a potential gap: if default-catalog is only defined at query 
>>>>>>>>>>>>>>> time, how
>>>>>>>>>>>>>>> should resolution work during view creation? For example, if we 
>>>>>>>>>>>>>>> want to
>>>>>>>>>>>>>>> validate that the tables referenced in the view exist, how can 
>>>>>>>>>>>>>>> we do that
>>>>>>>>>>>>>>> when default-catalog isn't defined, since the view hasn't been 
>>>>>>>>>>>>>>> created or
>>>>>>>>>>>>>>> loaded yet?
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> Thanks,
>>>>>>>>>>>>>>> >>> Walaa.
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> On Thu, Apr 24, 2025 at 7:02 AM Jan Kaul
>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> <jank...@mailbox.org.invalid>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> Yes, I have the same understanding. The view catalog is
>>>>>>>>>>>>>>> resolved at query time.
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> As you mentioned before, it's good to distinguish
>>>>>>>>>>>>>>> between the physical catalog and it's reference used in SQL 
>>>>>>>>>>>>>>> statements. The
>>>>>>>>>>>>>>> important part is that the physical catalog of the view and the 
>>>>>>>>>>>>>>> tables
>>>>>>>>>>>>>>> referenced in it's definition stay consistent. You could create 
>>>>>>>>>>>>>>> a view in a
>>>>>>>>>>>>>>> given physical catalog by referring to it as "catalogA", as in 
>>>>>>>>>>>>>>> your first
>>>>>>>>>>>>>>> point. If you then, given a different setup, refer to the same 
>>>>>>>>>>>>>>> physical
>>>>>>>>>>>>>>> catalog as "catalogB" in another session/environment, the 
>>>>>>>>>>>>>>> behavior should
>>>>>>>>>>>>>>> still work.
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> I would however rephrase your last point. Late binding
>>>>>>>>>>>>>>> applies to the view catalog name and by extension to all 
>>>>>>>>>>>>>>> partial table
>>>>>>>>>>>>>>> references when no "default-catalog" is present. Resolving the 
>>>>>>>>>>>>>>> view catalog
>>>>>>>>>>>>>>> name at query time is not opposed to storing the view metadata 
>>>>>>>>>>>>>>> in a catalog.
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> Or maybe I don't entirely understand what you mean.
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> Thanks
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> Jan
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> On 4/24/25 00:32, Walaa Eldin Moustafa wrote:
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> Hi Jan,
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> > The view is executed when it's being referenced in a
>>>>>>>>>>>>>>> SQL statement. That statement contains the information for the 
>>>>>>>>>>>>>>> query engine
>>>>>>>>>>>>>>> to resolve the catalog of the view.
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> If I’m understanding correctly, that means:
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> * If the view is queried as SELECT * FROM
>>>>>>>>>>>>>>> catalogA.namespace.view, then catalogA is considered the view’s 
>>>>>>>>>>>>>>> catalog.
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> * If the same view is later queried as SELECT * FROM
>>>>>>>>>>>>>>> catalogB.namespace.view (after renaming catalogA to catalogB, 
>>>>>>>>>>>>>>> and keeping
>>>>>>>>>>>>>>> everything else the same), then catalogB becomes the view’s 
>>>>>>>>>>>>>>> catalog.
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> Is that interpretation correct? If so, it sounds to me
>>>>>>>>>>>>>>> like the catalog is resolved at query time, based on how the 
>>>>>>>>>>>>>>> view is
>>>>>>>>>>>>>>> referenced, not from any stored metadata. That would imply some 
>>>>>>>>>>>>>>> sort of a
>>>>>>>>>>>>>>> late binding behavior (similar to the proposal), as opposed to 
>>>>>>>>>>>>>>> using some
>>>>>>>>>>>>>>> catalog that "stores" the view definition.
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> Thanks,
>>>>>>>>>>>>>>> >>>> Walaa
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> On Tue, Apr 22, 2025 at 11:01 AM Jan Kaul
>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> <jank...@mailbox.org.invalid>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> Hi Walaa,
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> Thanks for clarifying the aspects of non-determinism.
>>>>>>>>>>>>>>> Let me try to address your questions.
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> 1. This is my interpretation of the current spec: The
>>>>>>>>>>>>>>> view is executed when it's being referenced in a SQL statement. 
>>>>>>>>>>>>>>> That
>>>>>>>>>>>>>>> statement contains the information for the query engine to 
>>>>>>>>>>>>>>> resolve the
>>>>>>>>>>>>>>> catalog of the view. The query engine then uses that 
>>>>>>>>>>>>>>> information to fetch
>>>>>>>>>>>>>>> the view metadata from the catalog. It also needs to 
>>>>>>>>>>>>>>> temporarily keep track
>>>>>>>>>>>>>>> of which catalog it used to fetch the view metadata. It can 
>>>>>>>>>>>>>>> then use that
>>>>>>>>>>>>>>> information to resolve the table references in the views SQL 
>>>>>>>>>>>>>>> definition in
>>>>>>>>>>>>>>> case no default catalog is specified.
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> 2. The important part is that the catalog can be
>>>>>>>>>>>>>>> referenced at execution time. As long as that's the case I 
>>>>>>>>>>>>>>> would assume the
>>>>>>>>>>>>>>> view can be created in any catalog.
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> I think your point is really valuable because the
>>>>>>>>>>>>>>> current specification can lead to some unintuitive behavior. 
>>>>>>>>>>>>>>> For example
>>>>>>>>>>>>>>> for the following statement:
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> CREATE VIEW catalogA.sales.monthly_orders AS SELECT *
>>>>>>>>>>>>>>> from sales.orders;
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> If the session default catalog is not "catalogA", the
>>>>>>>>>>>>>>> "sales.orders" in the view query would not be the same as just 
>>>>>>>>>>>>>>> referencing
>>>>>>>>>>>>>>> "sales.orders" in a normal SQL statement. This is because 
>>>>>>>>>>>>>>> without a
>>>>>>>>>>>>>>> "default-catalog", the catalog name of "sales.orders" would 
>>>>>>>>>>>>>>> default to
>>>>>>>>>>>>>>> "catalogA".
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> However, I like the current design of the view spec,
>>>>>>>>>>>>>>> because it has the "closure" property. Because of the fact that 
>>>>>>>>>>>>>>> the "view
>>>>>>>>>>>>>>> catalog" has to be known when executing a view, all the 
>>>>>>>>>>>>>>> information
>>>>>>>>>>>>>>> required to resolve the table identifiers is contained in the 
>>>>>>>>>>>>>>> view metadata
>>>>>>>>>>>>>>> (and the "view catalog"). I think that if you make the 
>>>>>>>>>>>>>>> identifier
>>>>>>>>>>>>>>> resolution dependent on external parameters, it hinders 
>>>>>>>>>>>>>>> portability.
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> Thanks,
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> Jan
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> On 4/22/25 18:36, Walaa Eldin Moustafa wrote:
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> Hi Jan,
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> Thanks for the thoughtful feedback.
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> I think it’s important we clarify a key point before
>>>>>>>>>>>>>>> going deeper:
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> Non-determinism is not caused by session fallback
>>>>>>>>>>>>>>> behavior—it’s a fundamental limitation of using table 
>>>>>>>>>>>>>>> identifiers alone,
>>>>>>>>>>>>>>> regardless of whether we use the current rule, the proposed 
>>>>>>>>>>>>>>> fallback to the
>>>>>>>>>>>>>>> session’s default catalog, or even early vs. late binding.
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> The same fully qualified identifier (e.g.,
>>>>>>>>>>>>>>> catalogA.namespace.table) can resolve to different objects 
>>>>>>>>>>>>>>> depending solely
>>>>>>>>>>>>>>> on engine-specific routing logic or catalog aliases. So 
>>>>>>>>>>>>>>> determinism isn’t
>>>>>>>>>>>>>>> guaranteed just because an identifier is "fully qualified." The 
>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>> reliable anchor for identity is the UUID. That’s why the 
>>>>>>>>>>>>>>> proposed use of
>>>>>>>>>>>>>>> UUIDs is not just a hardening strategy. It’s the actual fix for 
>>>>>>>>>>>>>>> correctness.
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> To move the conversation forward, could you help
>>>>>>>>>>>>>>> clarify two things in the context of the current spec:
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> * Where in the metadata is the “view catalog” stored,
>>>>>>>>>>>>>>> so that an engine knows to fall back to it if default-catalog 
>>>>>>>>>>>>>>> is null?
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> * Are we even allowed to create views in the session's
>>>>>>>>>>>>>>> default catalog (i.e., without specifying a catalog) in the 
>>>>>>>>>>>>>>> current Iceberg
>>>>>>>>>>>>>>> spec?
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> These questions are important because if we can’t
>>>>>>>>>>>>>>> unambiguously recover the "view catalog" from metadata, then 
>>>>>>>>>>>>>>> defaulting to
>>>>>>>>>>>>>>> it is problematic. And if views can't be created in the default 
>>>>>>>>>>>>>>> catalog,
>>>>>>>>>>>>>>> then the fallback rule doesn’t generalize.
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> Thanks,
>>>>>>>>>>>>>>> >>>>> Walaa.
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> On Tue, Apr 22, 2025 at 3:14 AM Jan Kaul
>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> <jank...@mailbox.org.invalid>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> Hi Walaa,
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> thank you for your proposal. If I understood
>>>>>>>>>>>>>>> correctly, you proposal is composed of three parts:
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> - session default catalog as fallback for
>>>>>>>>>>>>>>> "default-catalog"
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> - session default namespace as fallback for
>>>>>>>>>>>>>>> "default-namepace"
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> - Late binding + UUID validation
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> I have some comments regarding these points.
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> 1. Session default catalog as fallback for
>>>>>>>>>>>>>>> "default-catalog"
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> Introducing a behavior that depends on the current
>>>>>>>>>>>>>>> session setup is in my opinion the definition of 
>>>>>>>>>>>>>>> "non-determinism". You
>>>>>>>>>>>>>>> could be running the same query-engine and catalog-setup on 
>>>>>>>>>>>>>>> different days,
>>>>>>>>>>>>>>> with different default session catalogs (which is rather 
>>>>>>>>>>>>>>> common), and would
>>>>>>>>>>>>>>> be getting different results.
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> Whereas with the current behavior, the view always
>>>>>>>>>>>>>>> produces the same results. The current behavior has some rough 
>>>>>>>>>>>>>>> edges in
>>>>>>>>>>>>>>> very niche use cases but I think is solid for most uses cases.
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> 2. Session default namespace as fallback for
>>>>>>>>>>>>>>> "default-namespace"
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> Similar to the above.
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> 3. Late binding + UUID validation
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> If I understand it correctly, the current
>>>>>>>>>>>>>>> implementation already uses late binding.
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> Generally, having UUID validation makes the setup
>>>>>>>>>>>>>>> more robust. Which is great. However, having UUID validation 
>>>>>>>>>>>>>>> still requires
>>>>>>>>>>>>>>> us to have a portable table identifier specification. Even if 
>>>>>>>>>>>>>>> we have the
>>>>>>>>>>>>>>> UUIDs of the referenced tables from the view, there simply 
>>>>>>>>>>>>>>> isn't an
>>>>>>>>>>>>>>> interface that let's us use those UUIDs. The catalog interface 
>>>>>>>>>>>>>>> is defined
>>>>>>>>>>>>>>> in terms of table identifiers.
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> So we always require a working catalog setup and
>>>>>>>>>>>>>>> suiting table identifiers to obtain the table metadata. We can 
>>>>>>>>>>>>>>> use the
>>>>>>>>>>>>>>> UUIDs to verify if we loaded the correct table. But this can 
>>>>>>>>>>>>>>> only be done
>>>>>>>>>>>>>>> after we used some identifier. Which means there is no way of 
>>>>>>>>>>>>>>> using UUIDs
>>>>>>>>>>>>>>> without a functioning catalog/identifier setup.
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> In conclusion, I prefer the current behavior for
>>>>>>>>>>>>>>> "default-catalog" because it is more deterministic in my 
>>>>>>>>>>>>>>> opinion. And I
>>>>>>>>>>>>>>> think the current spec does a good job for multi-engine table 
>>>>>>>>>>>>>>> identifier
>>>>>>>>>>>>>>> resolution. I see the UUID validation more of an additional 
>>>>>>>>>>>>>>> hardening
>>>>>>>>>>>>>>> strategy.
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> Thanks
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> Jan
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> On 4/21/25 17:38, Walaa Eldin Moustafa wrote:
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> Thanks Renjie!
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> The existing spec has some guidance on resolving
>>>>>>>>>>>>>>> catalogs on the fly already (to address the case of view text 
>>>>>>>>>>>>>>> with table
>>>>>>>>>>>>>>> identifiers missing the catalog part). The guidance is to use 
>>>>>>>>>>>>>>> the catalog
>>>>>>>>>>>>>>> where the view is stored. But I find this rule hard to 
>>>>>>>>>>>>>>> interpret or use.
>>>>>>>>>>>>>>> The catalog itself is a logical construct—such as a federated 
>>>>>>>>>>>>>>> catalog that
>>>>>>>>>>>>>>> delegates to multiple physical backends (e.g., HMS and REST). 
>>>>>>>>>>>>>>> In such
>>>>>>>>>>>>>>> cases, the catalog (e.g., `my_catalog` in 
>>>>>>>>>>>>>>> `my_catalog.namespace1.table1`)
>>>>>>>>>>>>>>> doesn’t physically store the tables; it only routes requests to 
>>>>>>>>>>>>>>> underlying
>>>>>>>>>>>>>>> stores. Therefore, defaulting identifier resolution based on 
>>>>>>>>>>>>>>> the catalog
>>>>>>>>>>>>>>> where the view is "stored" doesn’t align with how catalogs 
>>>>>>>>>>>>>>> actually behave
>>>>>>>>>>>>>>> in practice.
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> Thanks,
>>>>>>>>>>>>>>> >>>>>> Walaa.
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> On Sun, Apr 20, 2025 at 11:17 PM Renjie Liu <
>>>>>>>>>>>>>>> liurenjie2...@gmail.com> wrote:
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>> Hi, Walaa:
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>> Thanks for the proposal.
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>> I've reviewed the doc, but in general I have some
>>>>>>>>>>>>>>> concerns with resolving catalog names on the fly with query 
>>>>>>>>>>>>>>> engine defined
>>>>>>>>>>>>>>> catalog names. This introduces some flexibility at first 
>>>>>>>>>>>>>>> glance, but also
>>>>>>>>>>>>>>> makes misconfiguration difficult to explain.
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>> But I agree with one part that we should store
>>>>>>>>>>>>>>> resolved table uuid in view metadata, as table/view renaming 
>>>>>>>>>>>>>>> may introduce
>>>>>>>>>>>>>>> errors that's difficult to understand for user.
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>> On Sat, Apr 19, 2025 at 3:02 AM Walaa Eldin Moustafa
>>>>>>>>>>>>>>> <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>> Hi Everyone,
>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>> Looking forward to keeping up the momentum and
>>>>>>>>>>>>>>> closing out the MV spec as well. I’m hoping we can proceed to a 
>>>>>>>>>>>>>>> vote next
>>>>>>>>>>>>>>> week.
>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>> Here is a summary in case that helps. The proposal
>>>>>>>>>>>>>>> outlines a strategy for handling table identifiers in Iceberg 
>>>>>>>>>>>>>>> view
>>>>>>>>>>>>>>> metadata, with the goal of ensuring correctness, portability, 
>>>>>>>>>>>>>>> and engine
>>>>>>>>>>>>>>> compatibility. It recommends resolving table identifiers at 
>>>>>>>>>>>>>>> read time (late
>>>>>>>>>>>>>>> binding) rather than creation time, and introduces UUID-based 
>>>>>>>>>>>>>>> validation to
>>>>>>>>>>>>>>> maintain identity guarantees across engines, or sessions. It 
>>>>>>>>>>>>>>> also revises
>>>>>>>>>>>>>>> how default-catalog and default-namespace are handled 
>>>>>>>>>>>>>>> (defaulting both to
>>>>>>>>>>>>>>> the session context if not explicitly set) to better align with 
>>>>>>>>>>>>>>> engine
>>>>>>>>>>>>>>> behavior and improve cross-engine interoperability.
>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>> Please let me know your thoughts.
>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>> Thanks,
>>>>>>>>>>>>>>> >>>>>>>> Walaa.
>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>> On Wed, Apr 16, 2025 at 2:03 PM Walaa Eldin
>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>> Thanks Eduard and Sung! I have addressed the
>>>>>>>>>>>>>>> comments.
>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>> One key point to keep in mind is that catalog
>>>>>>>>>>>>>>> names in the spec refer to logical catalogs—i.e., the first 
>>>>>>>>>>>>>>> part of a
>>>>>>>>>>>>>>> three-part identifier. These correspond to Spark's DataSourceV2 
>>>>>>>>>>>>>>> catalogs,
>>>>>>>>>>>>>>> Trino connectors, and similar constructs. This is a level of 
>>>>>>>>>>>>>>> abstraction
>>>>>>>>>>>>>>> above physical catalogs, which are not referenced or used in 
>>>>>>>>>>>>>>> the view spec.
>>>>>>>>>>>>>>> The reason is that table identifiers in the view 
>>>>>>>>>>>>>>> definition/text itself
>>>>>>>>>>>>>>> refer to logical catalogs, not physical ones (since they 
>>>>>>>>>>>>>>> interface directly
>>>>>>>>>>>>>>> with the engine and not a specific metastore).
>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>> Thanks,
>>>>>>>>>>>>>>> >>>>>>>>> Walaa.
>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>> On Wed, Apr 16, 2025 at 6:15 AM Sung Yun <
>>>>>>>>>>>>>>> sungwy...@gmail.com> wrote:
>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>> Thank you Walaa for the proposal. I think view
>>>>>>>>>>>>>>> portability is a very important topic for us to continue 
>>>>>>>>>>>>>>> discussing as it
>>>>>>>>>>>>>>> relies on many assumptions within the data ecosystem for it to 
>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>> like you've highlighted well in the document.
>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>> I've added a few comments around how this may
>>>>>>>>>>>>>>> impact the permission questions the engines will be asking, and 
>>>>>>>>>>>>>>> whether
>>>>>>>>>>>>>>> that is the desired behavior.
>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>> Sung
>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>> On Wed, Apr 16, 2025 at 7:32 AM Eduard
>>>>>>>>>>>>>>> Tudenhöfner <etudenhoef...@apache.org> wrote:
>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks Walaa for tackling this problem. I've
>>>>>>>>>>>>>>> added a few comments to get a better understanding of how this 
>>>>>>>>>>>>>>> will look
>>>>>>>>>>>>>>> like in the actual implementation.
>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>>> Eduard
>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>>> On Tue, Apr 15, 2025 at 7:09 PM Walaa Eldin
>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Everyone,
>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>>>> Starting this thread to resume our discussion
>>>>>>>>>>>>>>> on how to reference table identifiers from Iceberg metadata, a 
>>>>>>>>>>>>>>> key aspect
>>>>>>>>>>>>>>> of the view specification, particularly in relation to the MV 
>>>>>>>>>>>>>>> (materialized
>>>>>>>>>>>>>>> view) extensions.
>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>>>> I had the chance to speak offline with a few
>>>>>>>>>>>>>>> community members to better understand how the current spec is 
>>>>>>>>>>>>>>> being
>>>>>>>>>>>>>>> interpreted. Those conversations served as inputs to a new 
>>>>>>>>>>>>>>> proposal on how
>>>>>>>>>>>>>>> table identifier references could be represented in metadata.
>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>>>> You can find the proposal here [1]. I look
>>>>>>>>>>>>>>> forward to your feedback and working together to move this 
>>>>>>>>>>>>>>> forward so we
>>>>>>>>>>>>>>> can finalize the MV spec as well.
>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>>>> [1]
>>>>>>>>>>>>>>> https://docs.google.com/document/d/1-I2v_OqBgJi_8HVaeH1u2jowghmXoB8XaJLzPBa_Hg8/edit?tab=t.0
>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> >>>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>

Reply via email to