+1 on adding a flag to support the Spark REST client behavior change
between v1.8 and v2.0.

At the same time, we may clarify further more on the behavior of DropTable
REST API,
https://github.com/apache/iceberg/blob/feed4e2544b5839fbc2fe040965af3906d053302/open-api/rest-catalog-open-api.yaml#L1099-L1099
.

Something like: We don't recommend clients/engines to delete table files in
case of dropping a table managed by a REST catalog.

Yufei


On Wed, Dec 11, 2024 at 10:21 AM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> Hi Y'all!
>
> Today we had a little discussion on the Apache Iceberg Catalog Community
> Sync
> about DROP and DROP WITH PURGE. Currently the SparkCatalog implementation
> inside of the reference library has a unique method of DROP WITH PURGE vs
> other
> implementations. The pseudo code is essentially
>
>
> ```
> use Spark to list files to be removed and delete them
> send a drop table request to the Catalog
> ```
>
> As opposed to other systems
>
> ```
> send a drop table request to the Catalog with the purge flag enabled
> ```
>
> This has led us to a situation where it becomes difficult for REST Catalogs
> with custom purge implementations (or those with ignore purge) to
> work properly with Spark.
>
> Bringing this behavior in line with non-Spark implementations
> would have possibly dramatic impacts on users of the
> iceberg library but our consensus in the Catalog Sync today was that we
> should
> eventually have that be the default behavior. To this end I propose the
> following
>
>
>    - We support a flag to allow current Spark users to delegate to the
>    REST Catalog
>    (all other catalog behaviors remain the same). PR available here
>    <https://github.com/apache/iceberg/pull/11317> from
>    (*Credit to Tobias who wrote the PR and brought up this topic)*
>    -  We deprecate the client side delete for Spark
>    - In the next major release (Iceberg 2.0?) we change the behavior
>    officially <https://github.com/apache/iceberg/issues/11754> to only
>    send through the Drop Purge flag with no client side file removal.
>    - For all non-REST catalog implementations we keep the code the same
>    for legacy compatibility.
>
>
> A user of 1.8 will then have the ability to choose for their Spark DROP
> PURGES whether
> or not to purge locally or Remotely for REST
>
> A user of 2.0 will only be able to do a remote purge
>
> Users of non-REST Catalogs will have no change in behavior.
>
>
> Thanks for your consideration,
> Russ
>

Reply via email to