Hi everyone,

Does anyone know across catalog implementations, when we drop tables with
*purge=true*, why do we only drop last metadata and files referred by it,
but not any of the previous metadata? e.g.

*create iceberg table1*; <--- metadata.json-1
*insert into table1* ...; <--- metadata.json-2

when I do *drop table1* after these two commands, `metadata.json-1` will
not be deleted. This will also mean if we rollback/compact table and then
drop, data files referred by some of the previous metadata files will also
not be deleted.

I know the community used to talk about table location ownership for file
cleanup after dropping table (e.g.
https://github.com/apache/iceberg/issues/1764
https://github.com/trinodb/trino/issues/5616 ) but I'm not sure if they
could completely solve the problem since we can customize metadata/data
location, and I think we should still delete the past metadata.json even if
the table doesn't own any location.

I was thinking about the following items:
1. to make a change to delete past metadata.json files as well when the
table is dropped with *purge=true* (small change, doesn't tackle
rollback/compaction data files)
2. add configuration regarding table's location ownership, and delete
underlying files in drop table if table owns location (more complicated)

I think 1 should be relatively safe to do despite that it's a behavior
change, but want to run it through the community first.

Thanks!
Yan

Reply via email to