Thanks to everyone who provided feedback.

I've incorporated feedback from the first round and updated the PR.

Please take a second (or first) look.

-Dan

On Mon, Mar 23, 2026 at 1:05 PM Daniel Weeks <[email protected]> wrote:

> Hey everyone,
>
> If you're interested in the first round of spec related updates for
> relative paths, please take a look and add comments:
> https://github.com/apache/iceberg/pull/15630
>
> -Dan
>
> On Mon, Mar 23, 2026 at 1:04 PM Daniel Weeks <[email protected]> wrote:
>
>> Hey Steve,
>>
>> I'm not sure if you were able get an answer on this question in any of
>> the follow up discussions we had on relative paths, but the situation you
>> describe is inherent to the difference between absolute and relative paths.
>>
>> The spec isn't responsible for how you relocate/duplicate/etc data if the
>> base component of the relative path is updated and is explicitly not
>> covered by the design.  That's the responsibility of the catalog or
>> implementation.
>>
>> If you want data persistence across metadata moves, you always have the
>> ability to produce absolute paths to retain the v1-3 behavior.  However, I
>> believe what we've learned through production deployments and in comparison
>> to other formats, is that primary use case is to either relocate the entire
>> dataset or duplicate the entire dataset, which is the basis for the
>> relative path model described in the proposal.
>>
>> As to the catalog handling, most (all?) implementations either do not
>> natively support rename (like HadoopCatalog) and others treat rename as a
>> metadata only operation but do not change the table location.  The closest
>> thing is probably register table in the REST catalog, but that is very much
>> left up to the catalog implementation.  I think we can draw from this that
>> most table relocations are being performed outside of the catalog an then
>> registered in the catalog.
>>
>> -Dan
>>
>>
>>
>>
>> On Wed, Feb 4, 2026 at 2:27 PM Steve <[email protected]> wrote:
>>
>>> Thanks all,
>>>
>>>   Following the relative path discussion last week, I want to raise a
>>> question about lifecycle clean up operations in the context of table
>>> location mutability.
>>> The current proposal established that "*the table location is the basis
>>> for all path resolution against persisted relative paths*". Since
>>> location remains mutable, this creates a behavioral difference between v3
>>> and v4 tables that increases operational complexity. Here's a concrete
>>> example:.
>>>
>>> *Scenario*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *CREATE TABLE prod.db.events (  event_id BIGINT,  event_time TIMESTAMP,
>>> payload STRING) USING icebergLOCATION 's3://bucket-a/warehouse/events';--
>>> Insert some dataINSERT INTO prod.db.events VALUES (1, current_timestamp(),
>>> 'data1');INSERT INTO prod.db.events VALUES (2, current_timestamp(),
>>> 'data2');-- User changes location (Spark)ALTER TABLE prod.db.events SET
>>> location 's3://bucket-b/warehouse/events';-- Write new dataINSERT INTO
>>> prod.db.events VALUES (3, current_timestamp(), 'data3');*
>>>
>>> *Result for v3 table on absolute path *
>>> Manifest entries:
>>>   - s3://bucket-a/warehouse/events/data/file1.parquet  (absolute - old
>>> location)
>>>   - s3://bucket-a/warehouse/events/data/file2.parquet  (absolute - old
>>> location)
>>>   - s3://bucket-b/warehouse/events/data/file3.parquet  (absolute - new
>>> location)
>>> Reads work out of the box as path are absolute
>>> Snapshot expiration will cover both locations before and after the
>>> change as iceberg metadata tracks the path at the time of creation
>>> Orphan removal is limited as it will only respect only the latest
>>> location
>>>
>>> *Result for v4 table on relative path*
>>> Manifest entries:
>>>   - file1.parquet  (relative - written when location was bucket-a)
>>>   - file2.parquet  (relative - written when location was bucket-a)
>>>   - file3.parquet  (relative - written when location is bucket-b)
>>> Path resolution for file1.parquet:
>>>   Resolved: s3://bucket-b/warehouse/events/data/file1.parquet  ❌
>>>   Actual:   s3://bucket-a/warehouse/events/data/file1.parquet
>>> Reads will fail after location change unless files are physically moved
>>> (either by catalog or by background process)
>>> Snapshot expiration and orphan removal will not cover locations before
>>> the update.
>>>
>>> *Question*
>>> In v1-3, updating location is a lightweight, metadata-only operation
>>> which only impacts future writes, and existing absolute paths continue to
>>> resolve correctly for read. In v4, this is no longer the case. A location
>>> update becomes a breaking change that requires physical file movement to
>>> maintain correctness. From what I can tell, a catalog can either validate
>>> and handle the movement, rewrite paths to absolute, or reject the update to
>>> make location effectively immutable. Understandably, the iceberg spec does
>>> not want to prescribe the catalog guidance, but should we acknowledge this
>>> behavior change and document the lifecycle cleanup implications? Would be
>>> great if we can disucss further before the spec is finalized.
>>>
>>> Thanks,
>>> Steve Zhang
>>>
>>>
>>>
>>> On Thu, Jan 29, 2026 at 5:48 PM Talat Uyarer via dev <
>>> [email protected]> wrote:
>>>
>>>> Hi All,
>>>>
>>>> We had a productive meeting today regarding the Relative Paths proposal.
>>>>
>>>> We've reached a general agreement on the approach. The changes will
>>>> involve explicitly defining path terminology (such as "absolute location")
>>>> and should be well-contained within a new section on Table Spec.
>>>>
>>>> The next step is to open a PR with the proposed changes, which may
>>>> include knock-on effects for the REST specification, such as updates to
>>>> register table and load table requests.
>>>>
>>>> If you'd like to access the meeting notes:
>>>> https://docs.google.com/document/d/1t0RxrK-nsCT83zXeD66kmGx_TMU2X8_xfN1A_k6dCV0/edit?usp=sharing
>>>>
>>>> You can find the recording here:
>>>> https://drive.google.com/file/d/11q65achM_3vCfaEVYsxmfAdbKQJb2drA/view?usp=sharing
>>>>
>>>> Thanks for everyone
>>>>
>>>> Talat
>>>>
>>>> On Fri, Aug 1, 2025 at 10:50 AM Wing Yew Poon
>>>> <[email protected]> wrote:
>>>>
>>>>> Dan,
>>>>> Thanks for the clarifications.
>>>>> Looking forward to the sync.
>>>>> - Wing Yew
>>>>>
>>>>>
>>>>> On Fri, Aug 1, 2025 at 8:43 AM Daniel Weeks <[email protected]> wrote:
>>>>>
>>>>>> Hey Wing Yu
>>>>>>
>>>>>> I see that you have been updating the Google doc containing the
>>>>>>> proposal.
>>>>>>
>>>>>>
>>>>>> That's correct, I've been working with Talat to update the doc based
>>>>>> on feedback from the comments and first round of discussion we had on 
>>>>>> this
>>>>>> topic.
>>>>>>
>>>>>> Looking through it now, as far as I can tell, the basic idea (from
>>>>>>> the original proposal) of inferring the table location from the path to 
>>>>>>> the
>>>>>>> current metadata.json has not changed. Is my reading correct?
>>>>>>
>>>>>>
>>>>>> So far, nothing has changed about table location inference, but we
>>>>>> will probably be revisiting this with respect to other
>>>>>> updates/clarifications.  There are still a couple open comments related 
>>>>>> to
>>>>>> this point, but it is one of the main goals.
>>>>>>
>>>>>> You have added clarification around how the path to the metadata is
>>>>>>> constructed from table location (from which the table location is thus
>>>>>>> reverse engineered) and around path relativization, but the original 
>>>>>>> idea
>>>>>>> does not appear to have changed. In that case, the use case of having a
>>>>>>> single copy of metadata but more than one copy of data (two or more
>>>>>>> locations) is not supported by the proposal. This was the sticking 
>>>>>>> point in
>>>>>>> the last sync to discuss the proposal.
>>>>>>
>>>>>>
>>>>>> I don't believe this was the sticking point from the original
>>>>>> discussion.  Having multiple copies/locations of the same data files 
>>>>>> under
>>>>>> a single table's management is explicitly a non-goal.  It was discussed 
>>>>>> in
>>>>>> the comments of the doc for caching/fallback use cases, but I think 
>>>>>> that's
>>>>>> better handled by specific engine/fileio implementations.
>>>>>>
>>>>>> The main sticking points were confusion around the complexity of how
>>>>>> paths are constructed/persisted and the interplay between
>>>>>> table/metadata/data locations depending on how those values are set in 
>>>>>> the
>>>>>> table metadata.  Based on that feedback, we're suggesting some changes,
>>>>>> which is primarily consist of: 1) defining path construction, resolution,
>>>>>> and relativization separately, 2) making all paths relative to the table
>>>>>> location (which simplifies resolution/relativization, 3) address
>>>>>> confusing/complex issues like path separators and expectations around
>>>>>> separators.
>>>>>>
>>>>>> We're still in the process of updating the document, but we will
>>>>>> schedule another sync to discuss these updates in detail and address a 
>>>>>> few
>>>>>> points that are still outstanding.
>>>>>>
>>>>>> Thanks,
>>>>>> Dan
>>>>>>
>>>>>> On Thu, Jul 31, 2025 at 5:47 PM Wing Yew Poon
>>>>>> <[email protected]> wrote:
>>>>>>
>>>>>>> Hi Daniel Weeks,
>>>>>>> I see that you have been updating the Google doc containing the
>>>>>>> proposal.
>>>>>>> Looking through it now, as far as I can tell, the basic idea (from
>>>>>>> the original proposal) of inferring the table location from the path to 
>>>>>>> the
>>>>>>> current metadata.json has not changed. Is my reading correct?
>>>>>>> You have added clarification around how the path to the metadata is
>>>>>>> constructed from table location (from which the table location is thus
>>>>>>> reverse engineered) and around path relativization, but the original 
>>>>>>> idea
>>>>>>> does not appear to have changed. In that case, the use case of having a
>>>>>>> single copy of metadata but more than one copy of data (two or more
>>>>>>> locations) is not supported by the proposal. This was the sticking 
>>>>>>> point in
>>>>>>> the last sync to discuss the proposal.
>>>>>>> Do you intend to have another sync to continue the discussion?
>>>>>>> Thanks,
>>>>>>> Wing Yew
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 10, 2025 at 4:41 PM Anurag Mantripragada
>>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>>> Thanks Kevin, yes, I see the recording link too but don’t have
>>>>>>>> access. I have requested access.
>>>>>>>>
>>>>>>>>
>>>>>>>> ~ Anurag Mantripragada
>>>>>>>>
>>>>>>>>
>>>>>>>> On Jul 10, 2025, at 2:43 PM, Kevin Liu <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Yes it was recorded. Dan or Talat should have the recording. I see
>>>>>>>> there's already a link for the recording associated with the gcal 
>>>>>>>> event but
>>>>>>>> I dont have access to it.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Kevin Liu
>>>>>>>>
>>>>>>>> On Thu, Jul 10, 2025 at 12:37 PM Anurag Mantripragada
>>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hey folks, was the sync recorded? I missed it due to calendar sync
>>>>>>>>> issues :(
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ~ Anurag Mantripragada
>>>>>>>>>
>>>>>>>>> On Jul 7, 2025, at 6:27 PM, ally heev <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> Thanks. I can see it now
>>>>>>>>>
>>>>>>>>> On Tue, Jul 8, 2025 at 12:37 AM Kevin Liu <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I can see the new event on the dev calendar.
>>>>>>>>>> [image: Screenshot 2025-07-07 at 12.04.08 PM.png]
>>>>>>>>>>
>>>>>>>>>> Subscribe to the "Iceberg Dev Events" calendar here:
>>>>>>>>>> https://iceberg.apache.org/community/#iceberg-community-events
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Kevin Liu
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 7, 2025 at 11:38 AM Daniel Weeks <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hey Ally (and everyone else).
>>>>>>>>>>>
>>>>>>>>>>> We hadn't scheduled the discussion for relative paths, but I
>>>>>>>>>>> just added an event to the dev calendar for Thursday at 9am (PT).
>>>>>>>>>>>
>>>>>>>>>>> Let me know if you still don't see it on the calendar.
>>>>>>>>>>>
>>>>>>>>>>> -Dan
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Jul 5, 2025 at 9:37 PM Jean-Baptiste Onofré <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Talat
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the update. I will do a new pass on the doc.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards
>>>>>>>>>>>> JB
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, May 28, 2025 at 12:13 AM Talat Uyarer
>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>> >
>>>>>>>>>>>> > Hi, Iceberg Community,
>>>>>>>>>>>> >
>>>>>>>>>>>> > As mentioned at the last sync, Dan and I have been working on
>>>>>>>>>>>> a proposal to add support for relative paths, which has been a long
>>>>>>>>>>>> requested feature. There have been a number of 
>>>>>>>>>>>> discussions/proposals over
>>>>>>>>>>>> the years, but we'd like to scope down and refocus effort to make 
>>>>>>>>>>>> some
>>>>>>>>>>>> meaningful progress on this issue.
>>>>>>>>>>>> >
>>>>>>>>>>>> > Please take a look at the linked doc and provide feedback.
>>>>>>>>>>>> We'd love to open up discussion on this topic at the next 
>>>>>>>>>>>> community sync
>>>>>>>>>>>> and we can hold one-off syncs on the topic if there's a lot of 
>>>>>>>>>>>> interest.
>>>>>>>>>>>> >
>>>>>>>>>>>> > You can access Iceberg's First V4 Spec change from here :)
>>>>>>>>>>>> >
>>>>>>>>>>>> > Proposal Issue:
>>>>>>>>>>>> https://github.com/apache/iceberg/issues/13141
>>>>>>>>>>>> > Doc: https://s.apache.org/iceberg-spec-relative-path
>>>>>>>>>>>> >
>>>>>>>>>>>> > Talat
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>

Reply via email to