Yes, Iceberg spec does not define where the data and metadata should be
located. /data and /metadata are paths by default, but users can override
this behavior by having customized location provider or set
write.metadata.path explicitly.

On Wed, Feb 26, 2025 at 1:24 PM karuppayya <karuppayya1...@gmail.com> wrote:

> Hello Team,
>
> I'm writing to propose a change to the orphan file removal logic in this
> PR <https://github.com/apache/iceberg/pull/12278>.
>
> Currently, the orphan file removal process lists files at the root of the
> table to figure out orphans files.
> This can lead to unintended consequences in scenarios where multiple
> tables share a common root directory.
> Example:
> *tbl1* -> */dir1/*tbl1
> *tbl2* -> */dir1*
> Orphan removal of tbl2 can clean up the tbl1 directory since the listing
> happens at *dir1.*
>
> I propose modifying the orphan file removal logic to list specifically
> within the `data` and `metadata` directories of the target table. This
> would ensure that only files within those directories,  and therefore
> directly associated with the table(in most cases), are considered for
> removal.
>
> Are there any potential drawbacks or edge cases that I haven't considered?
>
> *Note: *
> 1. This does not address scenarios where tables are nested within the
> `data` or `metadata` directories of another table.
> Example:
> *tbl1* -> dir/tbl1
> *tbl2* -> dir/tbl1/data/tbl2
> 2. When two tables have same location
> Some related discussions related to location ownership here
> <https://github.com/apache/iceberg/issues/4159> and here
> <https://github.com/apache/iceberg/issues/9133>
>
> Eager to hear your feedback here or on the PR. Thank you!.
>
> - Karuppayya
>
>

Reply via email to