Hey everyone,

If you're interested in the first round of spec related updates for
relative paths, please take a look and add comments:
https://github.com/apache/iceberg/pull/15630

-Dan

On Mon, Mar 23, 2026 at 1:04 PM Daniel Weeks <[email protected]> wrote:

> Hey Steve,
>
> I'm not sure if you were able get an answer on this question in any of the
> follow up discussions we had on relative paths, but the situation you
> describe is inherent to the difference between absolute and relative paths.
>
> The spec isn't responsible for how you relocate/duplicate/etc data if the
> base component of the relative path is updated and is explicitly not
> covered by the design.  That's the responsibility of the catalog or
> implementation.
>
> If you want data persistence across metadata moves, you always have the
> ability to produce absolute paths to retain the v1-3 behavior.  However, I
> believe what we've learned through production deployments and in comparison
> to other formats, is that primary use case is to either relocate the entire
> dataset or duplicate the entire dataset, which is the basis for the
> relative path model described in the proposal.
>
> As to the catalog handling, most (all?) implementations either do not
> natively support rename (like HadoopCatalog) and others treat rename as a
> metadata only operation but do not change the table location.  The closest
> thing is probably register table in the REST catalog, but that is very much
> left up to the catalog implementation.  I think we can draw from this that
> most table relocations are being performed outside of the catalog an then
> registered in the catalog.
>
> -Dan
>
>
>
>
> On Wed, Feb 4, 2026 at 2:27 PM Steve <[email protected]> wrote:
>
>> Thanks all,
>>
>>   Following the relative path discussion last week, I want to raise a
>> question about lifecycle clean up operations in the context of table
>> location mutability.
>> The current proposal established that "*the table location is the basis
>> for all path resolution against persisted relative paths*". Since
>> location remains mutable, this creates a behavioral difference between v3
>> and v4 tables that increases operational complexity. Here's a concrete
>> example:.
>>
>> *Scenario*
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *CREATE TABLE prod.db.events (  event_id BIGINT,  event_time TIMESTAMP,
>> payload STRING) USING icebergLOCATION 's3://bucket-a/warehouse/events';--
>> Insert some dataINSERT INTO prod.db.events VALUES (1, current_timestamp(),
>> 'data1');INSERT INTO prod.db.events VALUES (2, current_timestamp(),
>> 'data2');-- User changes location (Spark)ALTER TABLE prod.db.events SET
>> location 's3://bucket-b/warehouse/events';-- Write new dataINSERT INTO
>> prod.db.events VALUES (3, current_timestamp(), 'data3');*
>>
>> *Result for v3 table on absolute path *
>> Manifest entries:
>>   - s3://bucket-a/warehouse/events/data/file1.parquet  (absolute - old
>> location)
>>   - s3://bucket-a/warehouse/events/data/file2.parquet  (absolute - old
>> location)
>>   - s3://bucket-b/warehouse/events/data/file3.parquet  (absolute - new
>> location)
>> Reads work out of the box as path are absolute
>> Snapshot expiration will cover both locations before and after the change
>> as iceberg metadata tracks the path at the time of creation
>> Orphan removal is limited as it will only respect only the latest location
>>
>> *Result for v4 table on relative path*
>> Manifest entries:
>>   - file1.parquet  (relative - written when location was bucket-a)
>>   - file2.parquet  (relative - written when location was bucket-a)
>>   - file3.parquet  (relative - written when location is bucket-b)
>> Path resolution for file1.parquet:
>>   Resolved: s3://bucket-b/warehouse/events/data/file1.parquet  ❌
>>   Actual:   s3://bucket-a/warehouse/events/data/file1.parquet
>> Reads will fail after location change unless files are physically moved
>> (either by catalog or by background process)
>> Snapshot expiration and orphan removal will not cover locations before
>> the update.
>>
>> *Question*
>> In v1-3, updating location is a lightweight, metadata-only operation
>> which only impacts future writes, and existing absolute paths continue to
>> resolve correctly for read. In v4, this is no longer the case. A location
>> update becomes a breaking change that requires physical file movement to
>> maintain correctness. From what I can tell, a catalog can either validate
>> and handle the movement, rewrite paths to absolute, or reject the update to
>> make location effectively immutable. Understandably, the iceberg spec does
>> not want to prescribe the catalog guidance, but should we acknowledge this
>> behavior change and document the lifecycle cleanup implications? Would be
>> great if we can disucss further before the spec is finalized.
>>
>> Thanks,
>> Steve Zhang
>>
>>
>>
>> On Thu, Jan 29, 2026 at 5:48 PM Talat Uyarer via dev <
>> [email protected]> wrote:
>>
>>> Hi All,
>>>
>>> We had a productive meeting today regarding the Relative Paths proposal.
>>>
>>> We've reached a general agreement on the approach. The changes will
>>> involve explicitly defining path terminology (such as "absolute location")
>>> and should be well-contained within a new section on Table Spec.
>>>
>>> The next step is to open a PR with the proposed changes, which may
>>> include knock-on effects for the REST specification, such as updates to
>>> register table and load table requests.
>>>
>>> If you'd like to access the meeting notes:
>>> https://docs.google.com/document/d/1t0RxrK-nsCT83zXeD66kmGx_TMU2X8_xfN1A_k6dCV0/edit?usp=sharing
>>>
>>> You can find the recording here:
>>> https://drive.google.com/file/d/11q65achM_3vCfaEVYsxmfAdbKQJb2drA/view?usp=sharing
>>>
>>> Thanks for everyone
>>>
>>> Talat
>>>
>>> On Fri, Aug 1, 2025 at 10:50 AM Wing Yew Poon
>>> <[email protected]> wrote:
>>>
>>>> Dan,
>>>> Thanks for the clarifications.
>>>> Looking forward to the sync.
>>>> - Wing Yew
>>>>
>>>>
>>>> On Fri, Aug 1, 2025 at 8:43 AM Daniel Weeks <[email protected]> wrote:
>>>>
>>>>> Hey Wing Yu
>>>>>
>>>>> I see that you have been updating the Google doc containing the
>>>>>> proposal.
>>>>>
>>>>>
>>>>> That's correct, I've been working with Talat to update the doc based
>>>>> on feedback from the comments and first round of discussion we had on this
>>>>> topic.
>>>>>
>>>>> Looking through it now, as far as I can tell, the basic idea (from the
>>>>>> original proposal) of inferring the table location from the path to the
>>>>>> current metadata.json has not changed. Is my reading correct?
>>>>>
>>>>>
>>>>> So far, nothing has changed about table location inference, but we
>>>>> will probably be revisiting this with respect to other
>>>>> updates/clarifications.  There are still a couple open comments related to
>>>>> this point, but it is one of the main goals.
>>>>>
>>>>> You have added clarification around how the path to the metadata is
>>>>>> constructed from table location (from which the table location is thus
>>>>>> reverse engineered) and around path relativization, but the original idea
>>>>>> does not appear to have changed. In that case, the use case of having a
>>>>>> single copy of metadata but more than one copy of data (two or more
>>>>>> locations) is not supported by the proposal. This was the sticking point 
>>>>>> in
>>>>>> the last sync to discuss the proposal.
>>>>>
>>>>>
>>>>> I don't believe this was the sticking point from the original
>>>>> discussion.  Having multiple copies/locations of the same data files under
>>>>> a single table's management is explicitly a non-goal.  It was discussed in
>>>>> the comments of the doc for caching/fallback use cases, but I think that's
>>>>> better handled by specific engine/fileio implementations.
>>>>>
>>>>> The main sticking points were confusion around the complexity of how
>>>>> paths are constructed/persisted and the interplay between
>>>>> table/metadata/data locations depending on how those values are set in the
>>>>> table metadata.  Based on that feedback, we're suggesting some changes,
>>>>> which is primarily consist of: 1) defining path construction, resolution,
>>>>> and relativization separately, 2) making all paths relative to the table
>>>>> location (which simplifies resolution/relativization, 3) address
>>>>> confusing/complex issues like path separators and expectations around
>>>>> separators.
>>>>>
>>>>> We're still in the process of updating the document, but we will
>>>>> schedule another sync to discuss these updates in detail and address a few
>>>>> points that are still outstanding.
>>>>>
>>>>> Thanks,
>>>>> Dan
>>>>>
>>>>> On Thu, Jul 31, 2025 at 5:47 PM Wing Yew Poon
>>>>> <[email protected]> wrote:
>>>>>
>>>>>> Hi Daniel Weeks,
>>>>>> I see that you have been updating the Google doc containing the
>>>>>> proposal.
>>>>>> Looking through it now, as far as I can tell, the basic idea (from
>>>>>> the original proposal) of inferring the table location from the path to 
>>>>>> the
>>>>>> current metadata.json has not changed. Is my reading correct?
>>>>>> You have added clarification around how the path to the metadata is
>>>>>> constructed from table location (from which the table location is thus
>>>>>> reverse engineered) and around path relativization, but the original idea
>>>>>> does not appear to have changed. In that case, the use case of having a
>>>>>> single copy of metadata but more than one copy of data (two or more
>>>>>> locations) is not supported by the proposal. This was the sticking point 
>>>>>> in
>>>>>> the last sync to discuss the proposal.
>>>>>> Do you intend to have another sync to continue the discussion?
>>>>>> Thanks,
>>>>>> Wing Yew
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 10, 2025 at 4:41 PM Anurag Mantripragada
>>>>>> <[email protected]> wrote:
>>>>>>
>>>>>>> Thanks Kevin, yes, I see the recording link too but don’t have
>>>>>>> access. I have requested access.
>>>>>>>
>>>>>>>
>>>>>>> ~ Anurag Mantripragada
>>>>>>>
>>>>>>>
>>>>>>> On Jul 10, 2025, at 2:43 PM, Kevin Liu <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Yes it was recorded. Dan or Talat should have the recording. I see
>>>>>>> there's already a link for the recording associated with the gcal event 
>>>>>>> but
>>>>>>> I dont have access to it.
>>>>>>>
>>>>>>> Best,
>>>>>>> Kevin Liu
>>>>>>>
>>>>>>> On Thu, Jul 10, 2025 at 12:37 PM Anurag Mantripragada
>>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hey folks, was the sync recorded? I missed it due to calendar sync
>>>>>>>> issues :(
>>>>>>>>
>>>>>>>>
>>>>>>>> ~ Anurag Mantripragada
>>>>>>>>
>>>>>>>> On Jul 7, 2025, at 6:27 PM, ally heev <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Thanks. I can see it now
>>>>>>>>
>>>>>>>> On Tue, Jul 8, 2025 at 12:37 AM Kevin Liu <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I can see the new event on the dev calendar.
>>>>>>>>> [image: Screenshot 2025-07-07 at 12.04.08 PM.png]
>>>>>>>>>
>>>>>>>>> Subscribe to the "Iceberg Dev Events" calendar here:
>>>>>>>>> https://iceberg.apache.org/community/#iceberg-community-events
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Kevin Liu
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jul 7, 2025 at 11:38 AM Daniel Weeks <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hey Ally (and everyone else).
>>>>>>>>>>
>>>>>>>>>> We hadn't scheduled the discussion for relative paths, but I just
>>>>>>>>>> added an event to the dev calendar for Thursday at 9am (PT).
>>>>>>>>>>
>>>>>>>>>> Let me know if you still don't see it on the calendar.
>>>>>>>>>>
>>>>>>>>>> -Dan
>>>>>>>>>>
>>>>>>>>>> On Sat, Jul 5, 2025 at 9:37 PM Jean-Baptiste Onofré <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Talat
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the update. I will do a new pass on the doc.
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>> JB
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 28, 2025 at 12:13 AM Talat Uyarer
>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> > Hi, Iceberg Community,
>>>>>>>>>>> >
>>>>>>>>>>> > As mentioned at the last sync, Dan and I have been working on
>>>>>>>>>>> a proposal to add support for relative paths, which has been a long
>>>>>>>>>>> requested feature. There have been a number of 
>>>>>>>>>>> discussions/proposals over
>>>>>>>>>>> the years, but we'd like to scope down and refocus effort to make 
>>>>>>>>>>> some
>>>>>>>>>>> meaningful progress on this issue.
>>>>>>>>>>> >
>>>>>>>>>>> > Please take a look at the linked doc and provide feedback.
>>>>>>>>>>> We'd love to open up discussion on this topic at the next community 
>>>>>>>>>>> sync
>>>>>>>>>>> and we can hold one-off syncs on the topic if there's a lot of 
>>>>>>>>>>> interest.
>>>>>>>>>>> >
>>>>>>>>>>> > You can access Iceberg's First V4 Spec change from here :)
>>>>>>>>>>> >
>>>>>>>>>>> > Proposal Issue: https://github.com/apache/iceberg/issues/13141
>>>>>>>>>>> > Doc: https://s.apache.org/iceberg-spec-relative-path
>>>>>>>>>>> >
>>>>>>>>>>> > Talat
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>

Reply via email to