Hey everyone, If you're interested in the first round of spec related updates for relative paths, please take a look and add comments: https://github.com/apache/iceberg/pull/15630
-Dan On Mon, Mar 23, 2026 at 1:04 PM Daniel Weeks <[email protected]> wrote: > Hey Steve, > > I'm not sure if you were able get an answer on this question in any of the > follow up discussions we had on relative paths, but the situation you > describe is inherent to the difference between absolute and relative paths. > > The spec isn't responsible for how you relocate/duplicate/etc data if the > base component of the relative path is updated and is explicitly not > covered by the design. That's the responsibility of the catalog or > implementation. > > If you want data persistence across metadata moves, you always have the > ability to produce absolute paths to retain the v1-3 behavior. However, I > believe what we've learned through production deployments and in comparison > to other formats, is that primary use case is to either relocate the entire > dataset or duplicate the entire dataset, which is the basis for the > relative path model described in the proposal. > > As to the catalog handling, most (all?) implementations either do not > natively support rename (like HadoopCatalog) and others treat rename as a > metadata only operation but do not change the table location. The closest > thing is probably register table in the REST catalog, but that is very much > left up to the catalog implementation. I think we can draw from this that > most table relocations are being performed outside of the catalog an then > registered in the catalog. > > -Dan > > > > > On Wed, Feb 4, 2026 at 2:27 PM Steve <[email protected]> wrote: > >> Thanks all, >> >> Following the relative path discussion last week, I want to raise a >> question about lifecycle clean up operations in the context of table >> location mutability. >> The current proposal established that "*the table location is the basis >> for all path resolution against persisted relative paths*". Since >> location remains mutable, this creates a behavioral difference between v3 >> and v4 tables that increases operational complexity. Here's a concrete >> example:. >> >> *Scenario* >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> *CREATE TABLE prod.db.events ( event_id BIGINT, event_time TIMESTAMP, >> payload STRING) USING icebergLOCATION 's3://bucket-a/warehouse/events';-- >> Insert some dataINSERT INTO prod.db.events VALUES (1, current_timestamp(), >> 'data1');INSERT INTO prod.db.events VALUES (2, current_timestamp(), >> 'data2');-- User changes location (Spark)ALTER TABLE prod.db.events SET >> location 's3://bucket-b/warehouse/events';-- Write new dataINSERT INTO >> prod.db.events VALUES (3, current_timestamp(), 'data3');* >> >> *Result for v3 table on absolute path * >> Manifest entries: >> - s3://bucket-a/warehouse/events/data/file1.parquet (absolute - old >> location) >> - s3://bucket-a/warehouse/events/data/file2.parquet (absolute - old >> location) >> - s3://bucket-b/warehouse/events/data/file3.parquet (absolute - new >> location) >> Reads work out of the box as path are absolute >> Snapshot expiration will cover both locations before and after the change >> as iceberg metadata tracks the path at the time of creation >> Orphan removal is limited as it will only respect only the latest location >> >> *Result for v4 table on relative path* >> Manifest entries: >> - file1.parquet (relative - written when location was bucket-a) >> - file2.parquet (relative - written when location was bucket-a) >> - file3.parquet (relative - written when location is bucket-b) >> Path resolution for file1.parquet: >> Resolved: s3://bucket-b/warehouse/events/data/file1.parquet ❌ >> Actual: s3://bucket-a/warehouse/events/data/file1.parquet >> Reads will fail after location change unless files are physically moved >> (either by catalog or by background process) >> Snapshot expiration and orphan removal will not cover locations before >> the update. >> >> *Question* >> In v1-3, updating location is a lightweight, metadata-only operation >> which only impacts future writes, and existing absolute paths continue to >> resolve correctly for read. In v4, this is no longer the case. A location >> update becomes a breaking change that requires physical file movement to >> maintain correctness. From what I can tell, a catalog can either validate >> and handle the movement, rewrite paths to absolute, or reject the update to >> make location effectively immutable. Understandably, the iceberg spec does >> not want to prescribe the catalog guidance, but should we acknowledge this >> behavior change and document the lifecycle cleanup implications? Would be >> great if we can disucss further before the spec is finalized. >> >> Thanks, >> Steve Zhang >> >> >> >> On Thu, Jan 29, 2026 at 5:48 PM Talat Uyarer via dev < >> [email protected]> wrote: >> >>> Hi All, >>> >>> We had a productive meeting today regarding the Relative Paths proposal. >>> >>> We've reached a general agreement on the approach. The changes will >>> involve explicitly defining path terminology (such as "absolute location") >>> and should be well-contained within a new section on Table Spec. >>> >>> The next step is to open a PR with the proposed changes, which may >>> include knock-on effects for the REST specification, such as updates to >>> register table and load table requests. >>> >>> If you'd like to access the meeting notes: >>> https://docs.google.com/document/d/1t0RxrK-nsCT83zXeD66kmGx_TMU2X8_xfN1A_k6dCV0/edit?usp=sharing >>> >>> You can find the recording here: >>> https://drive.google.com/file/d/11q65achM_3vCfaEVYsxmfAdbKQJb2drA/view?usp=sharing >>> >>> Thanks for everyone >>> >>> Talat >>> >>> On Fri, Aug 1, 2025 at 10:50 AM Wing Yew Poon >>> <[email protected]> wrote: >>> >>>> Dan, >>>> Thanks for the clarifications. >>>> Looking forward to the sync. >>>> - Wing Yew >>>> >>>> >>>> On Fri, Aug 1, 2025 at 8:43 AM Daniel Weeks <[email protected]> wrote: >>>> >>>>> Hey Wing Yu >>>>> >>>>> I see that you have been updating the Google doc containing the >>>>>> proposal. >>>>> >>>>> >>>>> That's correct, I've been working with Talat to update the doc based >>>>> on feedback from the comments and first round of discussion we had on this >>>>> topic. >>>>> >>>>> Looking through it now, as far as I can tell, the basic idea (from the >>>>>> original proposal) of inferring the table location from the path to the >>>>>> current metadata.json has not changed. Is my reading correct? >>>>> >>>>> >>>>> So far, nothing has changed about table location inference, but we >>>>> will probably be revisiting this with respect to other >>>>> updates/clarifications. There are still a couple open comments related to >>>>> this point, but it is one of the main goals. >>>>> >>>>> You have added clarification around how the path to the metadata is >>>>>> constructed from table location (from which the table location is thus >>>>>> reverse engineered) and around path relativization, but the original idea >>>>>> does not appear to have changed. In that case, the use case of having a >>>>>> single copy of metadata but more than one copy of data (two or more >>>>>> locations) is not supported by the proposal. This was the sticking point >>>>>> in >>>>>> the last sync to discuss the proposal. >>>>> >>>>> >>>>> I don't believe this was the sticking point from the original >>>>> discussion. Having multiple copies/locations of the same data files under >>>>> a single table's management is explicitly a non-goal. It was discussed in >>>>> the comments of the doc for caching/fallback use cases, but I think that's >>>>> better handled by specific engine/fileio implementations. >>>>> >>>>> The main sticking points were confusion around the complexity of how >>>>> paths are constructed/persisted and the interplay between >>>>> table/metadata/data locations depending on how those values are set in the >>>>> table metadata. Based on that feedback, we're suggesting some changes, >>>>> which is primarily consist of: 1) defining path construction, resolution, >>>>> and relativization separately, 2) making all paths relative to the table >>>>> location (which simplifies resolution/relativization, 3) address >>>>> confusing/complex issues like path separators and expectations around >>>>> separators. >>>>> >>>>> We're still in the process of updating the document, but we will >>>>> schedule another sync to discuss these updates in detail and address a few >>>>> points that are still outstanding. >>>>> >>>>> Thanks, >>>>> Dan >>>>> >>>>> On Thu, Jul 31, 2025 at 5:47 PM Wing Yew Poon >>>>> <[email protected]> wrote: >>>>> >>>>>> Hi Daniel Weeks, >>>>>> I see that you have been updating the Google doc containing the >>>>>> proposal. >>>>>> Looking through it now, as far as I can tell, the basic idea (from >>>>>> the original proposal) of inferring the table location from the path to >>>>>> the >>>>>> current metadata.json has not changed. Is my reading correct? >>>>>> You have added clarification around how the path to the metadata is >>>>>> constructed from table location (from which the table location is thus >>>>>> reverse engineered) and around path relativization, but the original idea >>>>>> does not appear to have changed. In that case, the use case of having a >>>>>> single copy of metadata but more than one copy of data (two or more >>>>>> locations) is not supported by the proposal. This was the sticking point >>>>>> in >>>>>> the last sync to discuss the proposal. >>>>>> Do you intend to have another sync to continue the discussion? >>>>>> Thanks, >>>>>> Wing Yew >>>>>> >>>>>> >>>>>> On Thu, Jul 10, 2025 at 4:41 PM Anurag Mantripragada >>>>>> <[email protected]> wrote: >>>>>> >>>>>>> Thanks Kevin, yes, I see the recording link too but don’t have >>>>>>> access. I have requested access. >>>>>>> >>>>>>> >>>>>>> ~ Anurag Mantripragada >>>>>>> >>>>>>> >>>>>>> On Jul 10, 2025, at 2:43 PM, Kevin Liu <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> Yes it was recorded. Dan or Talat should have the recording. I see >>>>>>> there's already a link for the recording associated with the gcal event >>>>>>> but >>>>>>> I dont have access to it. >>>>>>> >>>>>>> Best, >>>>>>> Kevin Liu >>>>>>> >>>>>>> On Thu, Jul 10, 2025 at 12:37 PM Anurag Mantripragada >>>>>>> <[email protected]> wrote: >>>>>>> >>>>>>>> Hey folks, was the sync recorded? I missed it due to calendar sync >>>>>>>> issues :( >>>>>>>> >>>>>>>> >>>>>>>> ~ Anurag Mantripragada >>>>>>>> >>>>>>>> On Jul 7, 2025, at 6:27 PM, ally heev <[email protected]> wrote: >>>>>>>> >>>>>>>> Thanks. I can see it now >>>>>>>> >>>>>>>> On Tue, Jul 8, 2025 at 12:37 AM Kevin Liu <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> I can see the new event on the dev calendar. >>>>>>>>> [image: Screenshot 2025-07-07 at 12.04.08 PM.png] >>>>>>>>> >>>>>>>>> Subscribe to the "Iceberg Dev Events" calendar here: >>>>>>>>> https://iceberg.apache.org/community/#iceberg-community-events >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Kevin Liu >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Jul 7, 2025 at 11:38 AM Daniel Weeks <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hey Ally (and everyone else). >>>>>>>>>> >>>>>>>>>> We hadn't scheduled the discussion for relative paths, but I just >>>>>>>>>> added an event to the dev calendar for Thursday at 9am (PT). >>>>>>>>>> >>>>>>>>>> Let me know if you still don't see it on the calendar. >>>>>>>>>> >>>>>>>>>> -Dan >>>>>>>>>> >>>>>>>>>> On Sat, Jul 5, 2025 at 9:37 PM Jean-Baptiste Onofré < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Talat >>>>>>>>>>> >>>>>>>>>>> Thanks for the update. I will do a new pass on the doc. >>>>>>>>>>> >>>>>>>>>>> Regards >>>>>>>>>>> JB >>>>>>>>>>> >>>>>>>>>>> On Wed, May 28, 2025 at 12:13 AM Talat Uyarer >>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>> > >>>>>>>>>>> > Hi, Iceberg Community, >>>>>>>>>>> > >>>>>>>>>>> > As mentioned at the last sync, Dan and I have been working on >>>>>>>>>>> a proposal to add support for relative paths, which has been a long >>>>>>>>>>> requested feature. There have been a number of >>>>>>>>>>> discussions/proposals over >>>>>>>>>>> the years, but we'd like to scope down and refocus effort to make >>>>>>>>>>> some >>>>>>>>>>> meaningful progress on this issue. >>>>>>>>>>> > >>>>>>>>>>> > Please take a look at the linked doc and provide feedback. >>>>>>>>>>> We'd love to open up discussion on this topic at the next community >>>>>>>>>>> sync >>>>>>>>>>> and we can hold one-off syncs on the topic if there's a lot of >>>>>>>>>>> interest. >>>>>>>>>>> > >>>>>>>>>>> > You can access Iceberg's First V4 Spec change from here :) >>>>>>>>>>> > >>>>>>>>>>> > Proposal Issue: https://github.com/apache/iceberg/issues/13141 >>>>>>>>>>> > Doc: https://s.apache.org/iceberg-spec-relative-path >>>>>>>>>>> > >>>>>>>>>>> > Talat >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>
