Hey folks,

I’ve noticed a discrepancy between the Iceberg specification and the Java
implementation regarding the `operation` key in the `Snapshot` `summary`
field.

The `Snapshot` object's `summary` dictionary includes a *required* key
named `operation`, as outlined in the spec describing Table Metadata and
Snapshots [1] and the generated OpenAPI YAML [2]. However, in the Java
implementation [3], `operation` is treated as optional. In contrast, it
remains a required field in the Python implementation [4].
I also found that Java tests for `SnapshotParser` assert that the
`operation` field is null. [5]

Due to this discrepancy, a user reported [6] that the `metadata.json` file
generated for an Iceberg table could not be read by PyIceberg, though it is
readable using the Iceberg Java library.

How should we proceed from here? Should the Java library enforce this
requirement? Additionally, how should we handle existing `metadata.json`
files that were generated without this field?

Best,
Kevin Liu

[1] https://iceberg.apache.org/spec/#table-metadata-and-snapshots
[2]
https://github.com/apache/iceberg/blob/8e2eb9ac2e33ce4bac8956d4e2f099444d03c0e3/open-api/rest-catalog-open-api.yaml#L2057-L2060
[3]
https://github.com/apache/iceberg/blob/64b36999d7ff716ae2534fb0972fcc10d22a64c2/core/src/main/java/org/apache/iceberg/SnapshotParser.java#L124
[4]
https://github.com/apache/iceberg-python/blob/7cf0c225c3cdb32ac5e390de06b7b0e4fe7de92e/pyiceberg/table/snapshots.py#L182
[5]
https://github.com/apache/iceberg/blob/22a6b19c2e226eacc0aa78c1f2ffbdbb168b13be/core/src/test/java/org/apache/iceberg/TestSnapshotJson.java#L52
[6] https://github.com/apache/iceberg-python/issues/1106

Reply via email to