+1 on adding a flag to support the Spark REST client behavior change between v1.8 and v2.0.
At the same time, we may clarify further more on the behavior of DropTable REST API, https://github.com/apache/iceberg/blob/feed4e2544b5839fbc2fe040965af3906d053302/open-api/rest-catalog-open-api.yaml#L1099-L1099 . Something like: We don't recommend clients/engines to delete table files in case of dropping a table managed by a REST catalog. Yufei On Wed, Dec 11, 2024 at 10:21 AM Russell Spitzer <russell.spit...@gmail.com> wrote: > Hi Y'all! > > Today we had a little discussion on the Apache Iceberg Catalog Community > Sync > about DROP and DROP WITH PURGE. Currently the SparkCatalog implementation > inside of the reference library has a unique method of DROP WITH PURGE vs > other > implementations. The pseudo code is essentially > > > ``` > use Spark to list files to be removed and delete them > send a drop table request to the Catalog > ``` > > As opposed to other systems > > ``` > send a drop table request to the Catalog with the purge flag enabled > ``` > > This has led us to a situation where it becomes difficult for REST Catalogs > with custom purge implementations (or those with ignore purge) to > work properly with Spark. > > Bringing this behavior in line with non-Spark implementations > would have possibly dramatic impacts on users of the > iceberg library but our consensus in the Catalog Sync today was that we > should > eventually have that be the default behavior. To this end I propose the > following > > > - We support a flag to allow current Spark users to delegate to the > REST Catalog > (all other catalog behaviors remain the same). PR available here > <https://github.com/apache/iceberg/pull/11317> from > (*Credit to Tobias who wrote the PR and brought up this topic)* > - We deprecate the client side delete for Spark > - In the next major release (Iceberg 2.0?) we change the behavior > officially <https://github.com/apache/iceberg/issues/11754> to only > send through the Drop Purge flag with no client side file removal. > - For all non-REST catalog implementations we keep the code the same > for legacy compatibility. > > > A user of 1.8 will then have the ability to choose for their Spark DROP > PURGES whether > or not to purge locally or Remotely for REST > > A user of 2.0 will only be able to do a remote purge > > Users of non-REST Catalogs will have no change in behavior. > > > Thanks for your consideration, > Russ >