My guess would be to avoid complications with multiple committers
attempting to swap at the same time.

On Wed, Jul 31, 2024 at 9:50 AM Jack Ye <> wrote:

> I see, thank you Fokko, this is a very helpful context.
> Looking at the discussion in the PR and discussions in it, it seems like
> the version hint file is the key problem here. The file system table spec
> [1] is technically correct and only uses a single rename operation to
> perform the atomic commit, and defines that the v<version>.metadata.json is
> the latest file. However the additional write of a version hint file seems
> problematic as that could have additional failures and cause all sorts of
> edge case behaviors, and is not really strictly following the spec.
> I agree that if we want to properly follow the current file system table
> spec, then the right way is to stop the commit process after renaming to
> the v<version>.metadata.json, and the reader should perform a listing to
> discover the latest metadata file. If we go with that, this is
> interestingly becoming highly similar to the Delta Lake protocol, where the
> zero-padded log files [2] are discovered using this mechanism [3] I
> believe. And they have implementations for different storage systems
> including HDFS, S3, Azure, GCS, with a pluggable extension point.
> One question I have now: what is the motivation in the file system table
> spec to rename the latest table metadata to v<version>.metadata.json,
> rather than just a fixed name like latest.metadata.json? Why is the version
> number in the file name important?
> -Jack
> [1]
> [2]
> [3]
> On Tue, Jul 30, 2024 at 10:52 PM Fokko Driesprong <>
> wrote:
>> Jack,
>> no atomic drop table support: this seems pretty fixable, as you can
>>> change the semantics of dropping a table to be deleting the latest table
>>> version hint file, instead of having to delete everything in the folder. I
>>> feel that actually also fits the semantics of purge/no-purge better.
>> I would invite you to check out lisoda's PR
>> <> (#9546
>> <> is an earlier version with
>> more discussion) that works towards removing the version hint file to avoid
>> discrepancies between the latest committed metadata and the metadata that's
>> referenced in the hint file. These can go out of sync since the operation
>> there is not atomic. Removing this introduces other problems where you have
>> to determine the latest version of the metadata using prefix-listing, which
>> is not efficient and desirable on an object store as you already mentioned.
>> Kind regards,
>> Fokko
>> Op wo 31 jul 2024 om 04:39 schreef Jack Ye <>:
>>> Atomicity is just one requirement, and it also needs to be efficient,
>>> desirably a metadata-only operation.
>>> Looking at some documentations of GCS [1], the rename operation is still
>>> a COPY + DELETE behind the scene unless it is a hierarchical namespace
>>> bucket. The Azure documentation [2] also suggests that the fast rename
>>> feature is only available with hierarchical namespace that is for the Gen2
>>> buckets. I found little documentation about the exact rename guarantee and
>>> semantics of ADLS though. But it is undeniable that at least GCS and Azure
>>> should be able to work with HadoopCatalog pretty well with their latest
>>> offerings.
>>> Steve, if you could share more insights to this and related
>>> documentations, that would be really appreciated.
>>> -Jack
>>> [1]
>>> [2]
>>> On Tue, Jul 30, 2024 at 11:11 AM Steve Loughran
>>> <> wrote:
>>>> On Thu, 18 Jul 2024 at 00:02, Ryan Blue <> wrote:
>>>>> Hey everyone,
>>>>> There has been some recent discussion about improving
>>>>> HadoopTableOperations and the catalog based on those tables, but we've
>>>>> discouraged using file system only table (or "hadoop" tables) for years 
>>>>> now
>>>>> because of major problems:
>>>>> * It is only safe to use hadoop tables with HDFS; most local file
>>>>> systems, S3, and other common object stores are unsafe
>>>> Azure storage and linux local filesystems all support atomic file and
>>>> dir rename an delete; google gcs does it for files and dirs only. Windows,
>>>> well, anybody who claims to understand the semantics of MoveFile is
>>>> probably wrong (
>>>> )
>>>> * Despite not providing atomicity guarantees outside of HDFS, people
>>>>> use the tables in unsafe situations
>>>> which means "s3", unless something needs directory rename
>>>>> * HadoopCatalog cannot implement atomic operations for rename and drop
>>>>> table, which are commonly used in data engineering
>>>>> * Alternative file names (for instance when using metadata file
>>>>> compression) also break guarantees
>>>>> While these tables are useful for testing in non-production scenarios,
>>>>> I think it's misleading to have them in the core module because there's an
>>>>> appearance that they are a reasonable choice. I propose we deprecate the
>>>>> HadoopTableOperations and HadoopCatalog implementations and move them to
>>>>> tests the next time we can make breaking API changes (2.0).
>>>>> I think we should also consider similar fixes to the table spec. It
>>>>> currently describes how HadoopTableOperations works, which does not work 
>>>>> in
>>>>> object stores or local file systems. HDFS is becoming much less common and
>>>>> I propose that we note that the strategy in the spec should ONLY be used
>>>>> with HDFS.
>>>>> What do other people think?
>>>>> Ryan
>>>>> --
>>>>> Ryan Blue

Reply via email to