Yes, Iceberg spec does not define where the data and metadata should be located. /data and /metadata are paths by default, but users can override this behavior by having customized location provider or set write.metadata.path explicitly.
On Wed, Feb 26, 2025 at 1:24 PM karuppayya <karuppayya1...@gmail.com> wrote: > Hello Team, > > I'm writing to propose a change to the orphan file removal logic in this > PR <https://github.com/apache/iceberg/pull/12278>. > > Currently, the orphan file removal process lists files at the root of the > table to figure out orphans files. > This can lead to unintended consequences in scenarios where multiple > tables share a common root directory. > Example: > *tbl1* -> */dir1/*tbl1 > *tbl2* -> */dir1* > Orphan removal of tbl2 can clean up the tbl1 directory since the listing > happens at *dir1.* > > I propose modifying the orphan file removal logic to list specifically > within the `data` and `metadata` directories of the target table. This > would ensure that only files within those directories, and therefore > directly associated with the table(in most cases), are considered for > removal. > > Are there any potential drawbacks or edge cases that I haven't considered? > > *Note: * > 1. This does not address scenarios where tables are nested within the > `data` or `metadata` directories of another table. > Example: > *tbl1* -> dir/tbl1 > *tbl2* -> dir/tbl1/data/tbl2 > 2. When two tables have same location > Some related discussions related to location ownership here > <https://github.com/apache/iceberg/issues/4159> and here > <https://github.com/apache/iceberg/issues/9133> > > Eager to hear your feedback here or on the PR. Thank you!. > > - Karuppayya > >