I think in the situation you're demonstrating, the manifests are separated across two separate snapshots.
Here's an example: create table t1 (s string); insert into t1 values ('foo'); -- snapshot 0, manifest-list with 1 manifest pointing to file A (ADDED) insert into t1 values ('bar'); -- snapshot 1, manifest-list with 2 manifests pointing to file A (ADDED), file B (ADDED) delete from t1 where s = 'foo'; -- snapshot 2, manifest-list with 2 manifests pointing to file A (DELETED), file B (ADDED) The paths are not unique across snapshots 1/2, but within each snapshot they are. Now in the same case if the data was in the same file, you would have a rewrite of the datafile like this (assuming no row-level deletes): create table t1 (s string); insert into t1 values ('foo'), ('bar'); -- snapshot 0, manifest-list with 1 manifest pointing to file A (ADDED) delete from t1 where s = 'foo'; -- snapshot 1, manifest-list with 1 manifests pointing to file A (DELETED) + file B (ADDED) I hope I'm understanding your example correctly, but let me know if I'm off track here. Thanks, Dan On Fri, Mar 4, 2022 at 2:23 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > Hi Dan, > Thanks for the quick reply. > > >> For #2, the answer follows mostly because if the answer to #1 holds, then >> yes the pairwise intersection of entries in the manifest files of a given >> snapshot is empty. > > > Just to be pedantic, even with unique file names. It seems one could > construct a snapshots as: > Manifest 1: Add File A > Manifest 2: Delete File A > > From your answer it sounds like this is unexpected and readers generally > don't try to reconcile Deletes add Adds? > > Thanks, > Micah > > On Fri, Mar 4, 2022 at 2:10 PM Daniel Weeks <dwe...@apache.org> wrote: > >> Hey Micah, >> >> For #1, I don't believe spec clearly calls out that all data/delete files >> must be unique, but the requirements for cleanup would be violated in >> certain cases if you had the same file referenced in multiple manifests. >> In practice, the best way to ensure data correctness and metadata >> consistency is to ensure that all referenced files have unique locations >> and that those locations do not get overwritten. >> >> For #2, the answer follows mostly because if the answer to #1 holds, then >> yes the pairwise intersection of entries in the manifest files of a given >> snapshot is empty. >> >> The java library does perform some checks to prevent a file from being >> added to the same manifest multiple times, but I don't think that >> extends to all possible ways of adding files. So it may be possible, but >> not a good idea. >> >> Sam might know if there's a way to add a nav for the format page (it is a >> little difficult to navigate at the moment). >> >> -Dan >> >> On Thu, Mar 3, 2022 at 4:49 PM Micah Kornfield <emkornfi...@gmail.com> >> wrote: >> >>> Hi Iceberg Dev, >>> I tried searching for it in the specification but couldn't find anything >>> explicit: >>> >>> 1. Is it assumed that all data files and delete files will always have >>> globally unique names in a table? >>> 2. Is it expected that the pairwise intersection of all manifest files >>> in a snapshot is empty (i.e. For any given data file it has exactly zero or >>> 1 entries across all manifest files in a snapshot)? >>> >>> I think the uniqueness of both can maybe be inferred by this sentence >>> (but I'm not 100% sure): >>> >>>> When a file is replaced or deleted from the dataset, it’s manifest >>>> entry fields store the snapshot ID in which the file was deleted and status >>>> 2 (deleted). The file may be deleted from the file system when the snapshot >>>> in which it was deleted is garbage collected, assuming that older snapshots >>>> have also been garbage collected [1]. >>> >>> >>> Thanks, >>> Micah >>> >>> >>> P.S. Is there a way to add a table of contents to the specification. I >>> might be missing it but I don't see one rendered at: >>> https://iceberg.apache.org/spec/ >>> >>