Our general policy in the past has been to accept potentially invalid
metadata as long as we can reason about it or until we get to a point where
we have to throw an error.

Read liberally and write strictly.

I wouldn’t want to change the spec to match an incorrect writer but there
is nothing wrong with changing a reader imho if it’s primarily to fix
tables. I think the only thing we really want to avoid is making implicit
changes to the spec by normalizing completely out of spec metadata.

On Tue, Mar 18, 2025 at 4:46 AM Renjie Liu <liurenjie2...@gmail.com> wrote:

> I think there are two problems here:
>
> 1. As Fokko and Eduard mentioned, Glue has produced invalid v1 tables.
> 2. The java implementation didn't do a sanity check about table metadata
> for required fields.
>
> We could follow java implementation in other languages, but this seems a
> workaround for dealing with invalid table metadata. I'm not sure if this is
> desired approach.
>
>
> On Tue, Mar 18, 2025 at 5:26 PM Eduard Tudenhöfner <
> etudenhoef...@apache.org> wrote:
>
>> I also believe that the Java impl and the spec are correct for V1.
>> The TableMetadataParser reads *schemas
>> <https://github.com/apache/iceberg/blob/f3b3ee40871d38083ed095215fffa91acb2c8a45/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L356-L380>*
>> / *partition_specs
>> <https://github.com/apache/iceberg/blob/f3b3ee40871d38083ed095215fffa91acb2c8a45/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L393-L409>*
>> first. If those are present, then it must be a V2 table and therefore *schema
>> <https://github.com/apache/iceberg/blob/f3b3ee40871d38083ed095215fffa91acb2c8a45/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L382-L387>*
>> / *partition_spec
>> <https://github.com/apache/iceberg/blob/f3b3ee40871d38083ed095215fffa91acb2c8a45/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L412-L421>*
>>  don't
>> have to be read.
>> That being said, *schema* / *partition_spec* are still required fields
>> for V1, so I wouldn't change anything in the spec other than maybe adding
>> that implementation note of the Java impl.
>>
>> As Fokko already mentioned and looking at
>> https://github.com/apache/iceberg-rust/issues/1088 it seems that the
>> problem is with how Glue produces V1 metadata.
>>
>>
>> Eduard
>>
>> On Tue, Mar 18, 2025 at 9:53 AM Fokko Driesprong <fo...@apache.org>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Thanks for raising this. I believe the Java implementation and the spec
>>> are still in sync since for V1 we always write schema
>>> <https://github.com/apache/iceberg/blob/8f6ebb5b36a0263edfcb04e0c104b26225f95b07/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L176-L182>
>>> and partition-spec
>>> <https://github.com/apache/iceberg/blob/8f6ebb5b36a0263edfcb04e0c104b26225f95b07/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L192-L196>.
>>> According to the PR <https://github.com/apache/iceberg-rust/pull/1087>,
>>> it looks like the problem lies in Glue. I believe making the fields in the
>>> spec optional instead of required isn't the right approach, since this
>>> leads us to a path where we would update the spec for an
>>> implementation that is not adhering to it. Since the field is deprecated, I
>>> think this is a special case and we could add a section to Implementation
>>> Notes <https://iceberg.apache.org/spec/#appendix-f-implementation-notes> 
>>> that
>>> states that the schemas and current-schema-id take priority over schema.
>>>
>>> Curious to hear what others think.
>>>
>>> Kind regards,
>>> Fokko
>>>
>>> Op di 18 mrt 2025 om 04:20 schreef Kevin Liu <kevinjq...@apache.org>:
>>>
>>>> Hi Renjie,
>>>>
>>>> Thanks for raising this! For context, here's the iceberg-rust PR
>>>> <https://github.com/apache/iceberg-rust/pull/1087> where we found the
>>>> inconsistency.
>>>> I am also in favor of approach 2 for backwards compatibility reasons.
>>>> Perhaps, we can include this behavior in "Appendix E: Format version
>>>> changes" <https://iceberg.apache.org/spec/#version-2>, under the "Reading
>>>> v1 metadata for v2" section.
>>>>
>>>> Best,
>>>> Kevin Liu
>>>>
>>>>
>>>> On Mon, Mar 17, 2025 at 7:17 PM Renjie Liu <liurenjie2...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi:
>>>>>
>>>>> We found an inconsistency between java implementation and spec about
>>>>> partition-spec and schema in v1 table.
>>>>>
>>>>> In spec, it says in v1 table partition-spec and schema are required
>>>>> but deprecated: https://iceberg.apache.org/spec/#table-metadata
>>>>>
>>>>> While in java implementation, they are both optional. For schema, it
>>>>> checks for `schemas` first, and looks up the current table schema by
>>>>> `schema-id`.  Otherwise it looks for a `schema` field. Similar things
>>>>> happen for `partition spec`.
>>>>>
>>>>> We found this problem when implementing iceberg-rust. To resolve this
>>>>> inconsistency, there are two approaches:
>>>>>
>>>>> 1. Modify java implementation to match spec, e.g. force checking
>>>>> `schema` and `partition spec` field.
>>>>> 2. Update spec to claim that both `schema` and `partition spec` are
>>>>> optional for v1 table.
>>>>>
>>>>> Personally I'm in favor of approach 2 as it keeps backward
>>>>> compatibility and seems a more reasonable solution to me.
>>>>>
>>>>> Looking forward to hearing from you!
>>>>>
>>>>

Reply via email to