Hi Y'all!

Today we had a little discussion on the Apache Iceberg Catalog Community
Sync
about DROP and DROP WITH PURGE. Currently the SparkCatalog implementation
inside of the reference library has a unique method of DROP WITH PURGE vs
other
implementations. The pseudo code is essentially


```
use Spark to list files to be removed and delete them
send a drop table request to the Catalog
```

As opposed to other systems

```
send a drop table request to the Catalog with the purge flag enabled
```

This has led us to a situation where it becomes difficult for REST Catalogs
with custom purge implementations (or those with ignore purge) to
work properly with Spark.

Bringing this behavior in line with non-Spark implementations
would have possibly dramatic impacts on users of the
iceberg library but our consensus in the Catalog Sync today was that we
should
eventually have that be the default behavior. To this end I propose the
following


   - We support a flag to allow current Spark users to delegate to the REST
   Catalog
   (all other catalog behaviors remain the same). PR available here
   <https://github.com/apache/iceberg/pull/11317> from
   (*Credit to Tobias who wrote the PR and brought up this topic)*
   -  We deprecate the client side delete for Spark
   - In the next major release (Iceberg 2.0?) we change the behavior
   officially <https://github.com/apache/iceberg/issues/11754> to only
   send through the Drop Purge flag with no client side file removal.
   - For all non-REST catalog implementations we keep the code the same for
   legacy compatibility.


A user of 1.8 will then have the ability to choose for their Spark DROP
PURGES whether
or not to purge locally or Remotely for REST

A user of 2.0 will only be able to do a remote purge

Users of non-REST Catalogs will have no change in behavior.


Thanks for your consideration,
Russ

Reply via email to