That plan sounds good to me. Thanks, Russell! On Wed, Dec 11, 2024 at 1:43 PM Yufei Gu <flyrain...@gmail.com> wrote:
> +1 on adding a flag to support the Spark REST client behavior change > between v1.8 and v2.0. > > At the same time, we may clarify further more on the behavior of DropTable > REST API, > https://github.com/apache/iceberg/blob/feed4e2544b5839fbc2fe040965af3906d053302/open-api/rest-catalog-open-api.yaml#L1099-L1099 > . > > Something like: We don't recommend clients/engines to delete table files > in case of dropping a table managed by a REST catalog. > > Yufei > > > On Wed, Dec 11, 2024 at 10:21 AM Russell Spitzer < > russell.spit...@gmail.com> wrote: > >> Hi Y'all! >> >> Today we had a little discussion on the Apache Iceberg Catalog Community >> Sync >> about DROP and DROP WITH PURGE. Currently the SparkCatalog implementation >> inside of the reference library has a unique method of DROP WITH PURGE vs >> other >> implementations. The pseudo code is essentially >> >> >> ``` >> use Spark to list files to be removed and delete them >> send a drop table request to the Catalog >> ``` >> >> As opposed to other systems >> >> ``` >> send a drop table request to the Catalog with the purge flag enabled >> ``` >> >> This has led us to a situation where it becomes difficult for REST >> Catalogs >> with custom purge implementations (or those with ignore purge) to >> work properly with Spark. >> >> Bringing this behavior in line with non-Spark implementations >> would have possibly dramatic impacts on users of the >> iceberg library but our consensus in the Catalog Sync today was that we >> should >> eventually have that be the default behavior. To this end I propose the >> following >> >> >> - We support a flag to allow current Spark users to delegate to the >> REST Catalog >> (all other catalog behaviors remain the same). PR available here >> <https://github.com/apache/iceberg/pull/11317> from >> (*Credit to Tobias who wrote the PR and brought up this topic)* >> - We deprecate the client side delete for Spark >> - In the next major release (Iceberg 2.0?) we change the behavior >> officially <https://github.com/apache/iceberg/issues/11754> to only >> send through the Drop Purge flag with no client side file removal. >> - For all non-REST catalog implementations we keep the code the same >> for legacy compatibility. >> >> >> A user of 1.8 will then have the ability to choose for their Spark DROP >> PURGES whether >> or not to purge locally or Remotely for REST >> >> A user of 2.0 will only be able to do a remote purge >> >> Users of non-REST Catalogs will have no change in behavior. >> >> >> Thanks for your consideration, >> Russ >> >