I'm not convinced this is incorrect behavior (table spec or implementation), but it does lend to some confusion. The 'summary' field is optional, which means that if a summary is not provided, you do not have an associated 'operation' field. The 'operation' field is only required in the context of the summary, so it's actually possible for the implementation (i.e. the tests you reference) to not have an operation.
I think what is wrong here is that the REST spec marked the summary as required <https://github.com/apache/iceberg/blob/8e2eb9ac2e33ce4bac8956d4e2f099444d03c0e3/open-api/rest-catalog-open-api.yaml#L2040>, which is inconsistent with the table spec. On Wed, Oct 16, 2024 at 3:52 PM Anton Okolnychyi <aokolnyc...@gmail.com> wrote: > Based on [1], we never persisted the operation in the summary map. > Instead, we persisted it as a top-level field in Java, which is actually > NOT what the spec says. Does anyone remember cases when the operation was > unknown? I personally don't. > > [1] - > https://github.com/apache/iceberg/blob/17f1c4d2205b59c2bd877d4d31bbbef9e90979c5/core/src/main/java/org/apache/iceberg/SnapshotParser.java#L63 > > > ср, 16 жовт. 2024 р. о 12:42 Kevin Liu <kevin.jq....@gmail.com> пише: > >> Hey folks, >> >> I’ve noticed a discrepancy between the Iceberg specification and the Java >> implementation regarding the `operation` key in the `Snapshot` `summary` >> field. >> >> The `Snapshot` object's `summary` dictionary includes a *required* key >> named `operation`, as outlined in the spec describing Table Metadata and >> Snapshots [1] and the generated OpenAPI YAML [2]. However, in the Java >> implementation [3], `operation` is treated as optional. In contrast, it >> remains a required field in the Python implementation [4]. >> I also found that Java tests for `SnapshotParser` assert that the >> `operation` field is null. [5] >> >> Due to this discrepancy, a user reported [6] that the `metadata.json` >> file generated for an Iceberg table could not be read by PyIceberg, though >> it is readable using the Iceberg Java library. >> >> How should we proceed from here? Should the Java library enforce this >> requirement? Additionally, how should we handle existing `metadata.json` >> files that were generated without this field? >> >> Best, >> Kevin Liu >> >> [1] https://iceberg.apache.org/spec/#table-metadata-and-snapshots >> [2] >> https://github.com/apache/iceberg/blob/8e2eb9ac2e33ce4bac8956d4e2f099444d03c0e3/open-api/rest-catalog-open-api.yaml#L2057-L2060 >> [3] >> https://github.com/apache/iceberg/blob/64b36999d7ff716ae2534fb0972fcc10d22a64c2/core/src/main/java/org/apache/iceberg/SnapshotParser.java#L124 >> [4] >> https://github.com/apache/iceberg-python/blob/7cf0c225c3cdb32ac5e390de06b7b0e4fe7de92e/pyiceberg/table/snapshots.py#L182 >> [5] >> https://github.com/apache/iceberg/blob/22a6b19c2e226eacc0aa78c1f2ffbdbb168b13be/core/src/test/java/org/apache/iceberg/TestSnapshotJson.java#L52 >> [6] https://github.com/apache/iceberg-python/issues/1106 >> >>