Hi Sung, It seems we are running to issues related to a mismatch between the REST spec and table specifications. Currently, there's no clear definition of how the REST spec is meant to support different table specs. The closest reference I found is this statement <https://github.com/apache/iceberg/blob/8e743a5b5209569f84b6bace36e1106c67e1eab3/open-api/rest-catalog-open-api.yaml#L30-L30> in the REST spec.
Implementations should ideally support both Iceberg table specs v1 and v2, > with priority given to v2. I agree that the REST spec should aim to support v1, v2, and potentially the upcoming v3. In practice, it seems like the choice of table spec might ultimately be dictated by the REST catalog implementation. A best practice would be for the server to strive to support all Iceberg versions, but the REST spec itself should remain flexible enough to accommodate less strict table specs. For the case you mentioned, it should be fine to make sequence number optional since the spec has to support v1 table spec. It does feel confusing though. WDYT? Yufei On Thu, Oct 17, 2024 at 1:56 PM Anton Okolnychyi <aokolnyc...@gmail.com> wrote: > Well, the spec says nothing about a top-level `operation` field in JSON > [1]. Yet the Java implementation produces it [2] and removes the operation > from the summary map. This seems inconsistent? > > - Anton > > [1] - https://iceberg.apache.org/spec/#snapshots > [2] - > https://github.com/apache/iceberg/blob/17f1c4d2205b59c2bd877d4d31bbbef9e90979c5/core/src/main/java/org/apache/iceberg/SnapshotParser.java#L63 > > > чт, 17 жовт. 2024 р. о 10:06 Sung Yun <sun...@apache.org> пише: > >> > As a side note, the `rest-catalog-open-api.yaml` file [2] in the >> Iceberg repo contains the latest version of the spec. >> >> I think more clarity on this would be helpful. Is it really the case that >> the Open API spec contains the latest version of the spec? For example, I'm >> noticing a discrepancy between sequence-number in the Table Spec and in the >> Open API Spec... >> >> In the table spec, it's required for V2, but it's optional in the REST >> API Spec: >> >> https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L2319-L2335 >> >> On 2024/10/17 16:58:17 Kevin Liu wrote: >> > > Based on the example metadata, that looks like it is not to spec, so >> it's >> > reasonable that python would reject it. If the java implementation is >> > allowing for that, it's likely that we're being too relaxed (possibly a >> > holdover from v1 parsing). >> > I believe the Java implementation is relaxing the constraint. I'll >> create a >> > PR with test cases and the necessary changes. >> > >> > > Do you know what produced the metadata? >> > It was created by Snowflake [1]. After verifying this, I'll look into >> > raising the issue with them. >> > >> > As a side note, the `rest-catalog-open-api.yaml` file [2] in the Iceberg >> > repo contains the latest version of the spec. As we're continuing to >> evolve >> > to spec for V3, would it be helpful to create a frozen version >> representing >> > both the V1 and V2 specs for reference, possibly as a separate file? >> > >> > Best, >> > Kevin Liu >> > >> > [1] >> > >> https://github.com/apache/iceberg-python/issues/1106#issuecomment-2312108455 >> > [2] >> > >> https://github.com/apache/iceberg/blob/8e2eb9ac2e33ce4bac8956d4e2f099444d03c0e3/open-api/rest-catalog-open-api.yaml >> > >> > On Thu, Oct 17, 2024 at 9:20 AM Daniel Weeks <dwe...@apache.org> wrote: >> > >> > > Sung, >> > > >> > > I was thinking of v1, so you're right that manifest-list and summary >> are >> > > required as of v2. The REST Spec seems to follow the v2 definition, >> so I >> > > think we're somewhat implicitly requiring those fields via REST. >> > > >> > > Kevin, >> > > >> > > Based on the example metadata, that looks like it is not to spec, so >> it's >> > > reasonable that python would reject it. If the java implementation is >> > > allowing for that, it's likely that we're being too relaxed (possibly >> a >> > > holdover from v1 parsing). >> > > >> > > Do you know what produced the metadata? >> > > >> > > -Dan >> > > >> > > On Thu, Oct 17, 2024 at 9:02 AM Kevin Liu <kevin.jq....@gmail.com> >> wrote: >> > > >> > >> Thanks for the additional context. >> > >> >> > >> My understanding is that if a Snapshot has a `summary` field, it must >> > >> also have a corresponding `operation` key in the summary map. Is that >> > >> correct? Based on the `SnapshotParser`, this is not enforced [1]. >> > >> >> > >> The underlying issue in #1106 [2] is the missing `operation` field >> when >> > >> the `summary` field is present. >> > >> For example, >> > >> ``` >> > >> "summary" : { >> > >> "manifests-created" : "8", >> > >> "total-records" : "26508666891", >> > >> "added-files-size" : "3927895626752", >> > >> "manifests-kept" : "0", >> > >> "total-files-size" : "3927895626752", >> > >> "added-records" : "26508666891", >> > >> "added-data-files" : "231513", >> > >> "manifests-replaced" : "0", >> > >> "total-data-files" : "231513" >> > >> } >> > >> ``` >> > >> >> > >> It could be the case that this particular `metadata.json` was >> generated >> > >> not according to the spec. >> > >> >> > >> Best, >> > >> Kevin Liu >> > >> >> > >> >> > >> [1] >> > >> >> https://github.com/apache/iceberg/blob/17f1c4d2205b59c2bd877d4d31bbbef9e90979c5/core/src/main/java/org/apache/iceberg/SnapshotParser.java#L124-L142 >> > >> [2] https://github.com/apache/iceberg-python/issues/1106 >> > >> >> > >> >> > >> On Thu, Oct 17, 2024 at 8:47 AM Sung Yun <sun...@apache.org> wrote: >> > >> >> > >>> Thank you for the clarification Daniel, and thank you Kevin for >> raising >> > >>> this issue! >> > >>> >> > >>> Does that mean that we are creating component schemas that are the >> > >>> superset of the V1 and V2 schemas? And if so, should we remove >> summary and >> > >>> manifest-list from the required properties, and add manifests >> optional >> > >>> property to the Snapshot schema to support both V1 and V2 Summary >> specs? >> > >>> https://iceberg.apache.org/spec/#snapshots >> > >>> >> > >>> Or would creating separate component schemas for V1/V2 be a cleaner >> way >> > >>> to align the REST spec with the table spec? >> > >>> >> > >>> Sung >> > >>> >> > >>> On 2024/10/17 15:19:23 Daniel Weeks wrote: >> > >>> > I'm not convinced this is incorrect behavior (table spec or >> > >>> > implementation), but it does lend to some confusion. The >> 'summary' >> > >>> field >> > >>> > is optional, which means that if a summary is not provided, you >> do not >> > >>> have >> > >>> > an associated 'operation' field. The 'operation' field is only >> > >>> required in >> > >>> > the context of the summary, so it's actually possible for the >> > >>> > implementation (i.e. the tests you reference) to not have an >> operation. >> > >>> > >> > >>> > I think what is wrong here is that the REST spec marked the >> summary as >> > >>> > required >> > >>> > < >> > >>> >> https://github.com/apache/iceberg/blob/8e2eb9ac2e33ce4bac8956d4e2f099444d03c0e3/open-api/rest-catalog-open-api.yaml#L2040 >> > >>> >, >> > >>> > which is inconsistent with the table spec. >> > >>> > >> > >>> > On Wed, Oct 16, 2024 at 3:52 PM Anton Okolnychyi < >> > >>> aokolnyc...@gmail.com> >> > >>> > wrote: >> > >>> > >> > >>> > > Based on [1], we never persisted the operation in the summary >> map. >> > >>> > > Instead, we persisted it as a top-level field in Java, which is >> > >>> actually >> > >>> > > NOT what the spec says. Does anyone remember cases when the >> > >>> operation was >> > >>> > > unknown? I personally don't. >> > >>> > > >> > >>> > > [1] - >> > >>> > > >> > >>> >> https://github.com/apache/iceberg/blob/17f1c4d2205b59c2bd877d4d31bbbef9e90979c5/core/src/main/java/org/apache/iceberg/SnapshotParser.java#L63 >> > >>> > > >> > >>> > > >> > >>> > > ср, 16 жовт. 2024 р. о 12:42 Kevin Liu <kevin.jq....@gmail.com> >> > >>> пише: >> > >>> > > >> > >>> > >> Hey folks, >> > >>> > >> >> > >>> > >> I’ve noticed a discrepancy between the Iceberg specification >> and >> > >>> the Java >> > >>> > >> implementation regarding the `operation` key in the `Snapshot` >> > >>> `summary` >> > >>> > >> field. >> > >>> > >> >> > >>> > >> The `Snapshot` object's `summary` dictionary includes a >> *required* >> > >>> key >> > >>> > >> named `operation`, as outlined in the spec describing Table >> > >>> Metadata and >> > >>> > >> Snapshots [1] and the generated OpenAPI YAML [2]. However, in >> the >> > >>> Java >> > >>> > >> implementation [3], `operation` is treated as optional. In >> > >>> contrast, it >> > >>> > >> remains a required field in the Python implementation [4]. >> > >>> > >> I also found that Java tests for `SnapshotParser` assert that >> the >> > >>> > >> `operation` field is null. [5] >> > >>> > >> >> > >>> > >> Due to this discrepancy, a user reported [6] that the >> > >>> `metadata.json` >> > >>> > >> file generated for an Iceberg table could not be read by >> PyIceberg, >> > >>> though >> > >>> > >> it is readable using the Iceberg Java library. >> > >>> > >> >> > >>> > >> How should we proceed from here? Should the Java library >> enforce >> > >>> this >> > >>> > >> requirement? Additionally, how should we handle existing >> > >>> `metadata.json` >> > >>> > >> files that were generated without this field? >> > >>> > >> >> > >>> > >> Best, >> > >>> > >> Kevin Liu >> > >>> > >> >> > >>> > >> [1] >> https://iceberg.apache.org/spec/#table-metadata-and-snapshots >> > >>> > >> [2] >> > >>> > >> >> > >>> >> https://github.com/apache/iceberg/blob/8e2eb9ac2e33ce4bac8956d4e2f099444d03c0e3/open-api/rest-catalog-open-api.yaml#L2057-L2060 >> > >>> > >> [3] >> > >>> > >> >> > >>> >> https://github.com/apache/iceberg/blob/64b36999d7ff716ae2534fb0972fcc10d22a64c2/core/src/main/java/org/apache/iceberg/SnapshotParser.java#L124 >> > >>> > >> [4] >> > >>> > >> >> > >>> >> https://github.com/apache/iceberg-python/blob/7cf0c225c3cdb32ac5e390de06b7b0e4fe7de92e/pyiceberg/table/snapshots.py#L182 >> > >>> > >> [5] >> > >>> > >> >> > >>> >> https://github.com/apache/iceberg/blob/22a6b19c2e226eacc0aa78c1f2ffbdbb168b13be/core/src/test/java/org/apache/iceberg/TestSnapshotJson.java#L52 >> > >>> > >> [6] https://github.com/apache/iceberg-python/issues/1106 >> > >>> > >> >> > >>> > >> >> > >>> > >> > >>> >> > >> >> > >> >