It sounds like the argument here is that we should change the Spec for V1,
V2, and V3 to mark current-snapshot-id
as required. Then we should change all other implementations to follow this
new standard. I'm not sure that
is a good solution going forwards but I'm not sure of how we can support
catalogs/engines that cannot handle a null
correctly in this situation otherwise. Perhaps we should source out to see
if any other implementers worked off the
assumption of a non-optional "current-snapshot-id" and if we get a critical
mass we can try to make that change?
Because of how wide that change would be, I think we would need pretty
broad consensus to do so.

We could possibly also have a flag to allow the old behavior but that also
feels wrong to me, we have often gone with a motto of
read "wrong" write "correct" for things like this in the past and
continuing to write "wrong" is a disservice to any
new implementers . When we do have a contradiction between our
implementation and the spec I think we have
to trust that implementers followed the spec and fix the core library.

Are there any other solutions here?

On Mon, Feb 17, 2025 at 11:45 AM Fokko Driesprong <fo...@apache.org> wrote:

> Hey Robert,
>
> The thing is, that -1 cannot "go away".
>
>
> Yes, I agree, but that's also the case for null, as the field is optional
> in the spec
> <https://iceberg.apache.org/spec/?column-projection#table-metadata-fields>.
> Therefore we support both in PyIceberg
> <https://github.com/apache/iceberg-python/blob/300b8405a0fe7d0111321e5644d704026af9266b/pyiceberg/table/metadata.py#L71-L77>,
> Iceberg-Rust
> <https://github.com/apache/iceberg-rust/blob/752d69041e0461989c48dd1ca79bcff577776f5d/crates/iceberg/src/spec/table_metadata.rs#L500>,
> Iceberg-Java
> <https://github.com/apache/iceberg/blob/bcbbd0344623ffea5b092e2de5debb0bc12892a1/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L458-L462>,
> and Iceberg-Go
> <https://github.com/apache/iceberg-go/blob/ada5480954d9b41d2f8eb4c765523614fad65e1a/table/metadata.go#L837-L841>.
> On the write side, they all produce null instead of -1. Therefore, I was 
> surprised that
> it comes up now, and not earlier.
>
> I'd prefer to keep the previous behavior - otherwise implementations may
>> fall back to 0, which is definitely wrong.
>
>
> I'm not seeing why it would fall back to 0, and I agree, that's wrong.
>
> Would be better IMHO not to break existing implementations / render
>> existing setups incompatible with Iceberg 1.8.
>
>
> In my opinion, if this had been caught in an RC, it would be open for
> discussion, but that ship has sailed. Let's hear what others think.
>
> Kind regards,
> Fokko
>
> Op ma 17 feb 2025 om 18:16 schreef Robert Stupp <sn...@snazy.de>:
>
>> Hi Fokko,
>>
>> sure, in general "absent" or "null" would be cleaner. But now we have two
>> representations for the same case - I suspect most went with the "reference
>> behavior".
>>
>> The thing is, that -1 cannot "go away".
>>
>> I'd prefer to keep the previous behavior - otherwise implementations may
>> fall back to 0, which is definitely wrong. Would be better IMHO not to
>> break existing implementations / render existing setups incompatible with
>> Iceberg 1.8.
>>
>>
>> On 17.02.25 15:49, Fokko Driesprong wrote:
>>
>> Hey Robert,
>>
>> Thanks for raising this.
>>
>> snapshot-ID -1 isn't per-se invalid, because the valid values are not
>>> defined in the spec.
>>
>>
>> For me, this is invalid, since there is no snapshot with -1 in the
>> snapshots property. In the tests with the PR, you can see that there are
>> no snapshots
>> <https://github.com/apache/iceberg/pull/11560/files#diff-41bdfb6698d2aa7b47ff7d5fabc558a5a64f8b7496fe1bcd8f8ecb69b2afc128R112>.
>> A year ago we had a similar discussion on PyIceberg
>> <https://py.iceberg.apache.org/configuration/#backward-compatibility>
>> around this and this ended up in adding a flag to fall back to the old
>> behavior. I do agree that we should have communicated this more clearly
>> with the release.
>>
>> Kind regards,
>> Fokko
>>
>>
>>
>>
>> Op ma 17 feb 2025 om 12:25 schreef Robert Stupp <sn...@snazy.de>:
>>
>>> Feels like https://github.com/apache/iceberg/pull/11560 introduced a
>>> behavior change.
>>>
>>> snapshot-ID -1 isn't per-se invalid, because the valid values are not
>>> defined in the spec.
>>>
>>> Previous Iceberg-Java versions always produced -1 if there's no current
>>> snapshot - 1.8 produces `null` in that case. So there are now two
>>> _different_ values for "no current snapshot".
>>>
>>> Implementations that rely on the behavior of the "reference
>>> implementation" (Iceberg-Java) do now fail in case there's no current
>>> snapshot.
>>> On 13.02.25 10:09, Amogh Jahagirdar wrote:
>>>
>>> I'm pleased to announce the release of Apache Iceberg 1.8.0!
>>>
>>> Apache Iceberg is an open table format for huge analytic datasets.
>>> Iceberg
>>> delivers high query performance for tables with tens of petabytes of
>>> data,
>>> along with atomic commits, concurrent writes, and SQL-compatible table
>>> evolution.
>>>
>>> This release can be downloaded from:
>>> https://www.apache.org/dyn/closer.cgi/iceberg/apache-iceberg-1.8.0/apache-iceberg-1.8.0.tar.gz
>>>
>>>
>>> Release notes: https://iceberg.apache.org/releases/#180-release
>>> <https://iceberg.apache.org/releases/180-release>
>>>
>>> Java artifacts are available from Maven Central.
>>>
>>> Thanks to everyone for contributing!
>>>
>>> --
>>> Robert Stupp
>>> @snazy
>>>
>>> --
>> Robert Stupp
>> @snazy
>>
>>

Reply via email to