Jack,

no atomic drop table support: this seems pretty fixable, as you can change
> the semantics of dropping a table to be deleting the latest table version
> hint file, instead of having to delete everything in the folder. I feel
> that actually also fits the semantics of purge/no-purge better.


I would invite you to check out lisoda's PR
<https://github.com/apache/iceberg/pulls/BsoBird> (#9546
<https://github.com/apache/iceberg/pull/9546> is an earlier version with
more discussion) that works towards removing the version hint file to avoid
discrepancies between the latest committed metadata and the metadata that's
referenced in the hint file. These can go out of sync since the operation
there is not atomic. Removing this introduces other problems where you have
to determine the latest version of the metadata using prefix-listing, which
is not efficient and desirable on an object store as you already mentioned.

Kind regards,
Fokko

Op wo 31 jul 2024 om 04:39 schreef Jack Ye <yezhao...@gmail.com>:

> Atomicity is just one requirement, and it also needs to be efficient,
> desirably a metadata-only operation.
>
> Looking at some documentations of GCS [1], the rename operation is still a
> COPY + DELETE behind the scene unless it is a hierarchical namespace
> bucket. The Azure documentation [2] also suggests that the fast rename
> feature is only available with hierarchical namespace that is for the Gen2
> buckets. I found little documentation about the exact rename guarantee and
> semantics of ADLS though. But it is undeniable that at least GCS and Azure
> should be able to work with HadoopCatalog pretty well with their latest
> offerings.
>
> Steve, if you could share more insights to this and related
> documentations, that would be really appreciated.
>
> -Jack
>
> [1] https://cloud.google.com/storage/docs/rename-hns-folders
> [2]
> https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace#the-benefits-of-a-hierarchical-namespace
>
>
>
>
>
>
>
>
> On Tue, Jul 30, 2024 at 11:11 AM Steve Loughran
> <ste...@cloudera.com.invalid> wrote:
>
>>
>>
>> On Thu, 18 Jul 2024 at 00:02, Ryan Blue <b...@apache.org> wrote:
>>
>>> Hey everyone,
>>>
>>> There has been some recent discussion about improving
>>> HadoopTableOperations and the catalog based on those tables, but we've
>>> discouraged using file system only table (or "hadoop" tables) for years now
>>> because of major problems:
>>> * It is only safe to use hadoop tables with HDFS; most local file
>>> systems, S3, and other common object stores are unsafe
>>>
>>
>> Azure storage and linux local filesystems all support atomic file and dir
>> rename an delete; google gcs does it for files and dirs only. Windows,
>> well, anybody who claims to understand the semantics of MoveFile is
>> probably wrong (
>> https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-movefilewithprogressw
>> )
>>
>> * Despite not providing atomicity guarantees outside of HDFS, people use
>>> the tables in unsafe situations
>>>
>>
>> which means "s3", unless something needs directory rename
>>
>>
>>> * HadoopCatalog cannot implement atomic operations for rename and drop
>>> table, which are commonly used in data engineering
>>> * Alternative file names (for instance when using metadata file
>>> compression) also break guarantees
>>>
>>> While these tables are useful for testing in non-production scenarios, I
>>> think it's misleading to have them in the core module because there's an
>>> appearance that they are a reasonable choice. I propose we deprecate the
>>> HadoopTableOperations and HadoopCatalog implementations and move them to
>>> tests the next time we can make breaking API changes (2.0).
>>>
>>> I think we should also consider similar fixes to the table spec. It
>>> currently describes how HadoopTableOperations works, which does not work in
>>> object stores or local file systems. HDFS is becoming much less common and
>>> I propose that we note that the strategy in the spec should ONLY be used
>>> with HDFS.
>>>
>>> What do other people think?
>>>
>>> Ryan
>>>
>>> --
>>> Ryan Blue
>>>
>>

Reply via email to