That plan sounds good to me. Thanks, Russell!

On Wed, Dec 11, 2024 at 1:43 PM Yufei Gu <flyrain...@gmail.com> wrote:

> +1 on adding a flag to support the Spark REST client behavior change
> between v1.8 and v2.0.
>
> At the same time, we may clarify further more on the behavior of DropTable
> REST API,
> https://github.com/apache/iceberg/blob/feed4e2544b5839fbc2fe040965af3906d053302/open-api/rest-catalog-open-api.yaml#L1099-L1099
> .
>
> Something like: We don't recommend clients/engines to delete table files
> in case of dropping a table managed by a REST catalog.
>
> Yufei
>
>
> On Wed, Dec 11, 2024 at 10:21 AM Russell Spitzer <
> russell.spit...@gmail.com> wrote:
>
>> Hi Y'all!
>>
>> Today we had a little discussion on the Apache Iceberg Catalog Community
>> Sync
>> about DROP and DROP WITH PURGE. Currently the SparkCatalog implementation
>> inside of the reference library has a unique method of DROP WITH PURGE vs
>> other
>> implementations. The pseudo code is essentially
>>
>>
>> ```
>> use Spark to list files to be removed and delete them
>> send a drop table request to the Catalog
>> ```
>>
>> As opposed to other systems
>>
>> ```
>> send a drop table request to the Catalog with the purge flag enabled
>> ```
>>
>> This has led us to a situation where it becomes difficult for REST
>> Catalogs
>> with custom purge implementations (or those with ignore purge) to
>> work properly with Spark.
>>
>> Bringing this behavior in line with non-Spark implementations
>> would have possibly dramatic impacts on users of the
>> iceberg library but our consensus in the Catalog Sync today was that we
>> should
>> eventually have that be the default behavior. To this end I propose the
>> following
>>
>>
>>    - We support a flag to allow current Spark users to delegate to the
>>    REST Catalog
>>    (all other catalog behaviors remain the same). PR available here
>>    <https://github.com/apache/iceberg/pull/11317> from
>>    (*Credit to Tobias who wrote the PR and brought up this topic)*
>>    -  We deprecate the client side delete for Spark
>>    - In the next major release (Iceberg 2.0?) we change the behavior
>>    officially <https://github.com/apache/iceberg/issues/11754> to only
>>    send through the Drop Purge flag with no client side file removal.
>>    - For all non-REST catalog implementations we keep the code the same
>>    for legacy compatibility.
>>
>>
>> A user of 1.8 will then have the ability to choose for their Spark DROP
>> PURGES whether
>> or not to purge locally or Remotely for REST
>>
>> A user of 2.0 will only be able to do a remote purge
>>
>> Users of non-REST Catalogs will have no change in behavior.
>>
>>
>> Thanks for your consideration,
>> Russ
>>
>

Reply via email to