I'm not convinced this is incorrect behavior (table spec or
implementation), but it does lend to some confusion.  The 'summary' field
is optional, which means that if a summary is not provided, you do not have
an associated 'operation' field.  The 'operation' field is only required in
the context of the summary, so it's actually possible for the
implementation (i.e. the tests you reference) to not have an operation.

I think what is wrong here is that the REST spec marked the summary as
required
<https://github.com/apache/iceberg/blob/8e2eb9ac2e33ce4bac8956d4e2f099444d03c0e3/open-api/rest-catalog-open-api.yaml#L2040>,
which is inconsistent with the table spec.

On Wed, Oct 16, 2024 at 3:52 PM Anton Okolnychyi <aokolnyc...@gmail.com>
wrote:

> Based on [1], we never persisted the operation in the summary map.
> Instead, we persisted it as a top-level field in Java, which is actually
> NOT what the spec says. Does anyone remember cases when the operation was
> unknown? I personally don't.
>
> [1] -
> https://github.com/apache/iceberg/blob/17f1c4d2205b59c2bd877d4d31bbbef9e90979c5/core/src/main/java/org/apache/iceberg/SnapshotParser.java#L63
>
>
> ср, 16 жовт. 2024 р. о 12:42 Kevin Liu <kevin.jq....@gmail.com> пише:
>
>> Hey folks,
>>
>> I’ve noticed a discrepancy between the Iceberg specification and the Java
>> implementation regarding the `operation` key in the `Snapshot` `summary`
>> field.
>>
>> The `Snapshot` object's `summary` dictionary includes a *required* key
>> named `operation`, as outlined in the spec describing Table Metadata and
>> Snapshots [1] and the generated OpenAPI YAML [2]. However, in the Java
>> implementation [3], `operation` is treated as optional. In contrast, it
>> remains a required field in the Python implementation [4].
>> I also found that Java tests for `SnapshotParser` assert that the
>> `operation` field is null. [5]
>>
>> Due to this discrepancy, a user reported [6] that the `metadata.json`
>> file generated for an Iceberg table could not be read by PyIceberg, though
>> it is readable using the Iceberg Java library.
>>
>> How should we proceed from here? Should the Java library enforce this
>> requirement? Additionally, how should we handle existing `metadata.json`
>> files that were generated without this field?
>>
>> Best,
>> Kevin Liu
>>
>> [1] https://iceberg.apache.org/spec/#table-metadata-and-snapshots
>> [2]
>> https://github.com/apache/iceberg/blob/8e2eb9ac2e33ce4bac8956d4e2f099444d03c0e3/open-api/rest-catalog-open-api.yaml#L2057-L2060
>> [3]
>> https://github.com/apache/iceberg/blob/64b36999d7ff716ae2534fb0972fcc10d22a64c2/core/src/main/java/org/apache/iceberg/SnapshotParser.java#L124
>> [4]
>> https://github.com/apache/iceberg-python/blob/7cf0c225c3cdb32ac5e390de06b7b0e4fe7de92e/pyiceberg/table/snapshots.py#L182
>> [5]
>> https://github.com/apache/iceberg/blob/22a6b19c2e226eacc0aa78c1f2ffbdbb168b13be/core/src/test/java/org/apache/iceberg/TestSnapshotJson.java#L52
>> [6] https://github.com/apache/iceberg-python/issues/1106
>>
>>

Reply via email to