Hi Dan, Thanks for the quick reply.
> For #2, the answer follows mostly because if the answer to #1 holds, then > yes the pairwise intersection of entries in the manifest files of a given > snapshot is empty. Just to be pedantic, even with unique file names. It seems one could construct a snapshots as: Manifest 1: Add File A Manifest 2: Delete File A >From your answer it sounds like this is unexpected and readers generally don't try to reconcile Deletes add Adds? Thanks, Micah On Fri, Mar 4, 2022 at 2:10 PM Daniel Weeks <dwe...@apache.org> wrote: > Hey Micah, > > For #1, I don't believe spec clearly calls out that all data/delete files > must be unique, but the requirements for cleanup would be violated in > certain cases if you had the same file referenced in multiple manifests. > In practice, the best way to ensure data correctness and metadata > consistency is to ensure that all referenced files have unique locations > and that those locations do not get overwritten. > > For #2, the answer follows mostly because if the answer to #1 holds, then > yes the pairwise intersection of entries in the manifest files of a given > snapshot is empty. > > The java library does perform some checks to prevent a file from being > added to the same manifest multiple times, but I don't think that > extends to all possible ways of adding files. So it may be possible, but > not a good idea. > > Sam might know if there's a way to add a nav for the format page (it is a > little difficult to navigate at the moment). > > -Dan > > On Thu, Mar 3, 2022 at 4:49 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > >> Hi Iceberg Dev, >> I tried searching for it in the specification but couldn't find anything >> explicit: >> >> 1. Is it assumed that all data files and delete files will always have >> globally unique names in a table? >> 2. Is it expected that the pairwise intersection of all manifest files >> in a snapshot is empty (i.e. For any given data file it has exactly zero or >> 1 entries across all manifest files in a snapshot)? >> >> I think the uniqueness of both can maybe be inferred by this sentence >> (but I'm not 100% sure): >> >>> When a file is replaced or deleted from the dataset, it’s manifest entry >>> fields store the snapshot ID in which the file was deleted and status 2 >>> (deleted). The file may be deleted from the file system when the snapshot >>> in which it was deleted is garbage collected, assuming that older snapshots >>> have also been garbage collected [1]. >> >> >> Thanks, >> Micah >> >> >> P.S. Is there a way to add a table of contents to the specification. I >> might be missing it but I don't see one rendered at: >> https://iceberg.apache.org/spec/ >> >