Thanks to everyone who provided feedback. I've incorporated feedback from the first round and updated the PR.
Please take a second (or first) look. -Dan On Mon, Mar 23, 2026 at 1:05 PM Daniel Weeks <[email protected]> wrote: > Hey everyone, > > If you're interested in the first round of spec related updates for > relative paths, please take a look and add comments: > https://github.com/apache/iceberg/pull/15630 > > -Dan > > On Mon, Mar 23, 2026 at 1:04 PM Daniel Weeks <[email protected]> wrote: > >> Hey Steve, >> >> I'm not sure if you were able get an answer on this question in any of >> the follow up discussions we had on relative paths, but the situation you >> describe is inherent to the difference between absolute and relative paths. >> >> The spec isn't responsible for how you relocate/duplicate/etc data if the >> base component of the relative path is updated and is explicitly not >> covered by the design. That's the responsibility of the catalog or >> implementation. >> >> If you want data persistence across metadata moves, you always have the >> ability to produce absolute paths to retain the v1-3 behavior. However, I >> believe what we've learned through production deployments and in comparison >> to other formats, is that primary use case is to either relocate the entire >> dataset or duplicate the entire dataset, which is the basis for the >> relative path model described in the proposal. >> >> As to the catalog handling, most (all?) implementations either do not >> natively support rename (like HadoopCatalog) and others treat rename as a >> metadata only operation but do not change the table location. The closest >> thing is probably register table in the REST catalog, but that is very much >> left up to the catalog implementation. I think we can draw from this that >> most table relocations are being performed outside of the catalog an then >> registered in the catalog. >> >> -Dan >> >> >> >> >> On Wed, Feb 4, 2026 at 2:27 PM Steve <[email protected]> wrote: >> >>> Thanks all, >>> >>> Following the relative path discussion last week, I want to raise a >>> question about lifecycle clean up operations in the context of table >>> location mutability. >>> The current proposal established that "*the table location is the basis >>> for all path resolution against persisted relative paths*". Since >>> location remains mutable, this creates a behavioral difference between v3 >>> and v4 tables that increases operational complexity. Here's a concrete >>> example:. >>> >>> *Scenario* >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> *CREATE TABLE prod.db.events ( event_id BIGINT, event_time TIMESTAMP, >>> payload STRING) USING icebergLOCATION 's3://bucket-a/warehouse/events';-- >>> Insert some dataINSERT INTO prod.db.events VALUES (1, current_timestamp(), >>> 'data1');INSERT INTO prod.db.events VALUES (2, current_timestamp(), >>> 'data2');-- User changes location (Spark)ALTER TABLE prod.db.events SET >>> location 's3://bucket-b/warehouse/events';-- Write new dataINSERT INTO >>> prod.db.events VALUES (3, current_timestamp(), 'data3');* >>> >>> *Result for v3 table on absolute path * >>> Manifest entries: >>> - s3://bucket-a/warehouse/events/data/file1.parquet (absolute - old >>> location) >>> - s3://bucket-a/warehouse/events/data/file2.parquet (absolute - old >>> location) >>> - s3://bucket-b/warehouse/events/data/file3.parquet (absolute - new >>> location) >>> Reads work out of the box as path are absolute >>> Snapshot expiration will cover both locations before and after the >>> change as iceberg metadata tracks the path at the time of creation >>> Orphan removal is limited as it will only respect only the latest >>> location >>> >>> *Result for v4 table on relative path* >>> Manifest entries: >>> - file1.parquet (relative - written when location was bucket-a) >>> - file2.parquet (relative - written when location was bucket-a) >>> - file3.parquet (relative - written when location is bucket-b) >>> Path resolution for file1.parquet: >>> Resolved: s3://bucket-b/warehouse/events/data/file1.parquet ❌ >>> Actual: s3://bucket-a/warehouse/events/data/file1.parquet >>> Reads will fail after location change unless files are physically moved >>> (either by catalog or by background process) >>> Snapshot expiration and orphan removal will not cover locations before >>> the update. >>> >>> *Question* >>> In v1-3, updating location is a lightweight, metadata-only operation >>> which only impacts future writes, and existing absolute paths continue to >>> resolve correctly for read. In v4, this is no longer the case. A location >>> update becomes a breaking change that requires physical file movement to >>> maintain correctness. From what I can tell, a catalog can either validate >>> and handle the movement, rewrite paths to absolute, or reject the update to >>> make location effectively immutable. Understandably, the iceberg spec does >>> not want to prescribe the catalog guidance, but should we acknowledge this >>> behavior change and document the lifecycle cleanup implications? Would be >>> great if we can disucss further before the spec is finalized. >>> >>> Thanks, >>> Steve Zhang >>> >>> >>> >>> On Thu, Jan 29, 2026 at 5:48 PM Talat Uyarer via dev < >>> [email protected]> wrote: >>> >>>> Hi All, >>>> >>>> We had a productive meeting today regarding the Relative Paths proposal. >>>> >>>> We've reached a general agreement on the approach. The changes will >>>> involve explicitly defining path terminology (such as "absolute location") >>>> and should be well-contained within a new section on Table Spec. >>>> >>>> The next step is to open a PR with the proposed changes, which may >>>> include knock-on effects for the REST specification, such as updates to >>>> register table and load table requests. >>>> >>>> If you'd like to access the meeting notes: >>>> https://docs.google.com/document/d/1t0RxrK-nsCT83zXeD66kmGx_TMU2X8_xfN1A_k6dCV0/edit?usp=sharing >>>> >>>> You can find the recording here: >>>> https://drive.google.com/file/d/11q65achM_3vCfaEVYsxmfAdbKQJb2drA/view?usp=sharing >>>> >>>> Thanks for everyone >>>> >>>> Talat >>>> >>>> On Fri, Aug 1, 2025 at 10:50 AM Wing Yew Poon >>>> <[email protected]> wrote: >>>> >>>>> Dan, >>>>> Thanks for the clarifications. >>>>> Looking forward to the sync. >>>>> - Wing Yew >>>>> >>>>> >>>>> On Fri, Aug 1, 2025 at 8:43 AM Daniel Weeks <[email protected]> wrote: >>>>> >>>>>> Hey Wing Yu >>>>>> >>>>>> I see that you have been updating the Google doc containing the >>>>>>> proposal. >>>>>> >>>>>> >>>>>> That's correct, I've been working with Talat to update the doc based >>>>>> on feedback from the comments and first round of discussion we had on >>>>>> this >>>>>> topic. >>>>>> >>>>>> Looking through it now, as far as I can tell, the basic idea (from >>>>>>> the original proposal) of inferring the table location from the path to >>>>>>> the >>>>>>> current metadata.json has not changed. Is my reading correct? >>>>>> >>>>>> >>>>>> So far, nothing has changed about table location inference, but we >>>>>> will probably be revisiting this with respect to other >>>>>> updates/clarifications. There are still a couple open comments related >>>>>> to >>>>>> this point, but it is one of the main goals. >>>>>> >>>>>> You have added clarification around how the path to the metadata is >>>>>>> constructed from table location (from which the table location is thus >>>>>>> reverse engineered) and around path relativization, but the original >>>>>>> idea >>>>>>> does not appear to have changed. In that case, the use case of having a >>>>>>> single copy of metadata but more than one copy of data (two or more >>>>>>> locations) is not supported by the proposal. This was the sticking >>>>>>> point in >>>>>>> the last sync to discuss the proposal. >>>>>> >>>>>> >>>>>> I don't believe this was the sticking point from the original >>>>>> discussion. Having multiple copies/locations of the same data files >>>>>> under >>>>>> a single table's management is explicitly a non-goal. It was discussed >>>>>> in >>>>>> the comments of the doc for caching/fallback use cases, but I think >>>>>> that's >>>>>> better handled by specific engine/fileio implementations. >>>>>> >>>>>> The main sticking points were confusion around the complexity of how >>>>>> paths are constructed/persisted and the interplay between >>>>>> table/metadata/data locations depending on how those values are set in >>>>>> the >>>>>> table metadata. Based on that feedback, we're suggesting some changes, >>>>>> which is primarily consist of: 1) defining path construction, resolution, >>>>>> and relativization separately, 2) making all paths relative to the table >>>>>> location (which simplifies resolution/relativization, 3) address >>>>>> confusing/complex issues like path separators and expectations around >>>>>> separators. >>>>>> >>>>>> We're still in the process of updating the document, but we will >>>>>> schedule another sync to discuss these updates in detail and address a >>>>>> few >>>>>> points that are still outstanding. >>>>>> >>>>>> Thanks, >>>>>> Dan >>>>>> >>>>>> On Thu, Jul 31, 2025 at 5:47 PM Wing Yew Poon >>>>>> <[email protected]> wrote: >>>>>> >>>>>>> Hi Daniel Weeks, >>>>>>> I see that you have been updating the Google doc containing the >>>>>>> proposal. >>>>>>> Looking through it now, as far as I can tell, the basic idea (from >>>>>>> the original proposal) of inferring the table location from the path to >>>>>>> the >>>>>>> current metadata.json has not changed. Is my reading correct? >>>>>>> You have added clarification around how the path to the metadata is >>>>>>> constructed from table location (from which the table location is thus >>>>>>> reverse engineered) and around path relativization, but the original >>>>>>> idea >>>>>>> does not appear to have changed. In that case, the use case of having a >>>>>>> single copy of metadata but more than one copy of data (two or more >>>>>>> locations) is not supported by the proposal. This was the sticking >>>>>>> point in >>>>>>> the last sync to discuss the proposal. >>>>>>> Do you intend to have another sync to continue the discussion? >>>>>>> Thanks, >>>>>>> Wing Yew >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 10, 2025 at 4:41 PM Anurag Mantripragada >>>>>>> <[email protected]> wrote: >>>>>>> >>>>>>>> Thanks Kevin, yes, I see the recording link too but don’t have >>>>>>>> access. I have requested access. >>>>>>>> >>>>>>>> >>>>>>>> ~ Anurag Mantripragada >>>>>>>> >>>>>>>> >>>>>>>> On Jul 10, 2025, at 2:43 PM, Kevin Liu <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Yes it was recorded. Dan or Talat should have the recording. I see >>>>>>>> there's already a link for the recording associated with the gcal >>>>>>>> event but >>>>>>>> I dont have access to it. >>>>>>>> >>>>>>>> Best, >>>>>>>> Kevin Liu >>>>>>>> >>>>>>>> On Thu, Jul 10, 2025 at 12:37 PM Anurag Mantripragada >>>>>>>> <[email protected]> wrote: >>>>>>>> >>>>>>>>> Hey folks, was the sync recorded? I missed it due to calendar sync >>>>>>>>> issues :( >>>>>>>>> >>>>>>>>> >>>>>>>>> ~ Anurag Mantripragada >>>>>>>>> >>>>>>>>> On Jul 7, 2025, at 6:27 PM, ally heev <[email protected]> wrote: >>>>>>>>> >>>>>>>>> Thanks. I can see it now >>>>>>>>> >>>>>>>>> On Tue, Jul 8, 2025 at 12:37 AM Kevin Liu <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> I can see the new event on the dev calendar. >>>>>>>>>> [image: Screenshot 2025-07-07 at 12.04.08 PM.png] >>>>>>>>>> >>>>>>>>>> Subscribe to the "Iceberg Dev Events" calendar here: >>>>>>>>>> https://iceberg.apache.org/community/#iceberg-community-events >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Kevin Liu >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Jul 7, 2025 at 11:38 AM Daniel Weeks <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hey Ally (and everyone else). >>>>>>>>>>> >>>>>>>>>>> We hadn't scheduled the discussion for relative paths, but I >>>>>>>>>>> just added an event to the dev calendar for Thursday at 9am (PT). >>>>>>>>>>> >>>>>>>>>>> Let me know if you still don't see it on the calendar. >>>>>>>>>>> >>>>>>>>>>> -Dan >>>>>>>>>>> >>>>>>>>>>> On Sat, Jul 5, 2025 at 9:37 PM Jean-Baptiste Onofré < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Talat >>>>>>>>>>>> >>>>>>>>>>>> Thanks for the update. I will do a new pass on the doc. >>>>>>>>>>>> >>>>>>>>>>>> Regards >>>>>>>>>>>> JB >>>>>>>>>>>> >>>>>>>>>>>> On Wed, May 28, 2025 at 12:13 AM Talat Uyarer >>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>> > >>>>>>>>>>>> > Hi, Iceberg Community, >>>>>>>>>>>> > >>>>>>>>>>>> > As mentioned at the last sync, Dan and I have been working on >>>>>>>>>>>> a proposal to add support for relative paths, which has been a long >>>>>>>>>>>> requested feature. There have been a number of >>>>>>>>>>>> discussions/proposals over >>>>>>>>>>>> the years, but we'd like to scope down and refocus effort to make >>>>>>>>>>>> some >>>>>>>>>>>> meaningful progress on this issue. >>>>>>>>>>>> > >>>>>>>>>>>> > Please take a look at the linked doc and provide feedback. >>>>>>>>>>>> We'd love to open up discussion on this topic at the next >>>>>>>>>>>> community sync >>>>>>>>>>>> and we can hold one-off syncs on the topic if there's a lot of >>>>>>>>>>>> interest. >>>>>>>>>>>> > >>>>>>>>>>>> > You can access Iceberg's First V4 Spec change from here :) >>>>>>>>>>>> > >>>>>>>>>>>> > Proposal Issue: >>>>>>>>>>>> https://github.com/apache/iceberg/issues/13141 >>>>>>>>>>>> > Doc: https://s.apache.org/iceberg-spec-relative-path >>>>>>>>>>>> > >>>>>>>>>>>> > Talat >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>
