Our general policy in the past has been to accept potentially invalid metadata as long as we can reason about it or until we get to a point where we have to throw an error.
Read liberally and write strictly. I wouldn’t want to change the spec to match an incorrect writer but there is nothing wrong with changing a reader imho if it’s primarily to fix tables. I think the only thing we really want to avoid is making implicit changes to the spec by normalizing completely out of spec metadata. On Tue, Mar 18, 2025 at 4:46 AM Renjie Liu <liurenjie2...@gmail.com> wrote: > I think there are two problems here: > > 1. As Fokko and Eduard mentioned, Glue has produced invalid v1 tables. > 2. The java implementation didn't do a sanity check about table metadata > for required fields. > > We could follow java implementation in other languages, but this seems a > workaround for dealing with invalid table metadata. I'm not sure if this is > desired approach. > > > On Tue, Mar 18, 2025 at 5:26 PM Eduard Tudenhöfner < > etudenhoef...@apache.org> wrote: > >> I also believe that the Java impl and the spec are correct for V1. >> The TableMetadataParser reads *schemas >> <https://github.com/apache/iceberg/blob/f3b3ee40871d38083ed095215fffa91acb2c8a45/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L356-L380>* >> / *partition_specs >> <https://github.com/apache/iceberg/blob/f3b3ee40871d38083ed095215fffa91acb2c8a45/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L393-L409>* >> first. If those are present, then it must be a V2 table and therefore *schema >> <https://github.com/apache/iceberg/blob/f3b3ee40871d38083ed095215fffa91acb2c8a45/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L382-L387>* >> / *partition_spec >> <https://github.com/apache/iceberg/blob/f3b3ee40871d38083ed095215fffa91acb2c8a45/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L412-L421>* >> don't >> have to be read. >> That being said, *schema* / *partition_spec* are still required fields >> for V1, so I wouldn't change anything in the spec other than maybe adding >> that implementation note of the Java impl. >> >> As Fokko already mentioned and looking at >> https://github.com/apache/iceberg-rust/issues/1088 it seems that the >> problem is with how Glue produces V1 metadata. >> >> >> Eduard >> >> On Tue, Mar 18, 2025 at 9:53 AM Fokko Driesprong <fo...@apache.org> >> wrote: >> >>> Hi everyone, >>> >>> Thanks for raising this. I believe the Java implementation and the spec >>> are still in sync since for V1 we always write schema >>> <https://github.com/apache/iceberg/blob/8f6ebb5b36a0263edfcb04e0c104b26225f95b07/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L176-L182> >>> and partition-spec >>> <https://github.com/apache/iceberg/blob/8f6ebb5b36a0263edfcb04e0c104b26225f95b07/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L192-L196>. >>> According to the PR <https://github.com/apache/iceberg-rust/pull/1087>, >>> it looks like the problem lies in Glue. I believe making the fields in the >>> spec optional instead of required isn't the right approach, since this >>> leads us to a path where we would update the spec for an >>> implementation that is not adhering to it. Since the field is deprecated, I >>> think this is a special case and we could add a section to Implementation >>> Notes <https://iceberg.apache.org/spec/#appendix-f-implementation-notes> >>> that >>> states that the schemas and current-schema-id take priority over schema. >>> >>> Curious to hear what others think. >>> >>> Kind regards, >>> Fokko >>> >>> Op di 18 mrt 2025 om 04:20 schreef Kevin Liu <kevinjq...@apache.org>: >>> >>>> Hi Renjie, >>>> >>>> Thanks for raising this! For context, here's the iceberg-rust PR >>>> <https://github.com/apache/iceberg-rust/pull/1087> where we found the >>>> inconsistency. >>>> I am also in favor of approach 2 for backwards compatibility reasons. >>>> Perhaps, we can include this behavior in "Appendix E: Format version >>>> changes" <https://iceberg.apache.org/spec/#version-2>, under the "Reading >>>> v1 metadata for v2" section. >>>> >>>> Best, >>>> Kevin Liu >>>> >>>> >>>> On Mon, Mar 17, 2025 at 7:17 PM Renjie Liu <liurenjie2...@gmail.com> >>>> wrote: >>>> >>>>> Hi: >>>>> >>>>> We found an inconsistency between java implementation and spec about >>>>> partition-spec and schema in v1 table. >>>>> >>>>> In spec, it says in v1 table partition-spec and schema are required >>>>> but deprecated: https://iceberg.apache.org/spec/#table-metadata >>>>> >>>>> While in java implementation, they are both optional. For schema, it >>>>> checks for `schemas` first, and looks up the current table schema by >>>>> `schema-id`. Otherwise it looks for a `schema` field. Similar things >>>>> happen for `partition spec`. >>>>> >>>>> We found this problem when implementing iceberg-rust. To resolve this >>>>> inconsistency, there are two approaches: >>>>> >>>>> 1. Modify java implementation to match spec, e.g. force checking >>>>> `schema` and `partition spec` field. >>>>> 2. Update spec to claim that both `schema` and `partition spec` are >>>>> optional for v1 table. >>>>> >>>>> Personally I'm in favor of approach 2 as it keeps backward >>>>> compatibility and seems a more reasonable solution to me. >>>>> >>>>> Looking forward to hearing from you! >>>>> >>>>