I agree that we shouldn't retroactively change the spec to align with the implementation.
We're trying to strike a balance here and I think the notes are an effective way to convey that while omitting or producing null with v1/v2 is technically to spec, it's going to be incompatible with most implementations and should be discouraged for compatibility reasons. So, no on spec change, yes on notes. On Tue, Feb 18, 2025 at 12:37 PM Russell Spitzer <russell.spit...@gmail.com> wrote: > `For compatibility with existing libraries, we should maintain that `-1` > is equivalent to no snapshot and it should be written for v1/v2.` > > The only issue I have with this is are we saying that for v1 and v2 we are > changing the spec to say that current-snapshot-id is required? Or are we > adding an implementation note, the java implementation will always return > -1 for these tables but this is not required for other clients? > > Unless we say that it is required we can have other implementations which > can return null or missing. I'm not a big fan of changing the spec to match > the implementation especially when we are narrowing what is allowed. > > > On Tue, Feb 18, 2025 at 2:15 PM Daniel Weeks <dwe...@apache.org> wrote: > >> I would agree the best path forward is to note the current behavior for >> v1/v2 since that's well established and address the behavior in v3. >> >> For compatibility with existing libraries, we should maintain that `-1` >> is equivalent to no snapshot and it should be written for v1/v2. >> >> With V3 we should support `-1` for backwards compatibility (though >> there's low likelihood that someone will promote an empty table), but >> always write `null`. >> >> I believe this is in part due to TableMetadata tracking the >> current-snapshot-id as a `long` value (as opposed to a Long) reference, >> which resulted in this oversight. >> >> -Dan >> >> On Tue, Feb 18, 2025 at 11:57 AM rdb...@gmail.com <rdb...@gmail.com> >> wrote: >> >>> +1 to reverting PT 11560 in main and 1.8.1. That avoids unnecessary >>> incompatibility with older readers. >>> >>> I also agree that we should update the spec to say what Russell suggests: >>> > that -1 has meant "no current snapshot" in the past and is >>> equivalent to missing/null. >>> >>> That's a correct description of the behavior. >>> >>> I don't think that we should create new requirements for this that >>> didn't previously exist -- that is, I don't see much value in going back to >>> mandate this behavior when writing v1 or v2 tables. If an implementation >>> were writing null for the current snapshot ID up until now, I don't think >>> that we can say that behavior was or is incorrect. It was an >>> incompatibility with the Java implementation and we should note the >>> behavior in the "Implementation Notes" section. >>> >>> > How about reverting #11560 for 1.8.1, and then reinstating this for >>> 2.0.0? >>> >>> I think we need to fix this at a format version boundary, not a library >>> version boundary. I'd be up for reinstating the write change for v3. >>> >>> On Tue, Feb 18, 2025 at 7:56 AM Russell Spitzer < >>> russell.spit...@gmail.com> wrote: >>> >>>> The only thing I think I agree with is defining that -1 has meant "no >>>> current snapshot" in the past and >>>> is equivalent to a missing/nuil (if we have to specify that) . >>>> >>>> I don't think there is any reason to change the behavior of writing >>>> null / missing unless >>>> that's really a point of confusion for folks. Is there a JSON library >>>> folks are using >>>> which distinguishes between missing/null? >>>> >>>> Would having field:null be the same as missing have avoided the issue >>>> because it seems like libraries >>>> that only handled non-null would also not handle "missing" >>>> current-snapshot-id as well? >>>> >>>> I'd really like to hear from other implementers here since changing all >>>> this again would be a lot of >>>> work and I thought we had understanding of Optional == Nullable as a >>>> valid thing in the spec. >>>> >>>> On Tue, Feb 18, 2025 at 7:35 AM Robert Stupp <sn...@snazy.de> wrote: >>>> >>>>> Correcting myself: schema/spec/sort seem to be always present - please >>>>> ignore that part in my previous email. The valid values for those fields >>>>> however should be defined. >>>>> On 18.02.25 14:29, Robert Stupp wrote: >>>>> >>>>> Reality is that Iceberg did write '-1' into current-snapshot-id (and >>>>> other "non-exist" marker values for schema/spec/sort) instead of omitting >>>>> the field. >>>>> >>>>> Side note: the table-spec says that these fields are optional, but >>>>> nothing about whether it is nullable. >>>>> >>>>> The spec should at least be amended to explicitly define the valid >>>>> values (-1 for current-snapshot-id for no snapshot is just a fact now that >>>>> it's there). IMO that field being 'null' isn't defined in the spec, but >>>>> the >>>>> absence of the field is. >>>>> >>>>> Proposal for the implementation: >>>>> >>>>> * revert https://github.com/apache/iceberg/pull/11560 in 1.8.1 and on >>>>> main >>>>> >>>>> Proposal for the spec (for current-snapshot-id): >>>>> >>>>> * Define the valid value ranges for the field >>>>> * Define that the absence of the field means "no current snapshot" >>>>> * Define that the value -1 of the field means "no current snapshot" >>>>> * Define that the value 'null' of the field means "no current snapshot" >>>>> * Define that new implementations must not write the field if there's >>>>> no current snapshot >>>>> Proposal for the spec (for schema/spec/sort): >>>>> >>>>> * Define the valid value ranges for the fields >>>>> * Define the "schema/spec/sort not present" values (the fields are >>>>> optional for v1 but required for v2+v3). >>>>> * OR Define that "schema/spec/sort must be absent" if there is no >>>>> current schema/spec/sort. >>>>> >>>>> WDYT? >>>>> >>>>> On 17.02.25 21:07, Russell Spitzer wrote: >>>>> >>>>> It sounds like the argument here is that we should change the Spec for >>>>> V1, V2, and V3 to mark current-snapshot-id >>>>> as required. Then we should change all other implementations to follow >>>>> this new standard. I'm not sure that >>>>> is a good solution going forwards but I'm not sure of how we can >>>>> support catalogs/engines that cannot handle a null >>>>> correctly in this situation otherwise. Perhaps we should source out to >>>>> see if any other implementers worked off the >>>>> assumption of a non-optional "current-snapshot-id" and if we get a >>>>> critical mass we can try to make that change? >>>>> Because of how wide that change would be, I think we would need pretty >>>>> broad consensus to do so. >>>>> >>>>> We could possibly also have a flag to allow the old behavior but that >>>>> also feels wrong to me, we have often gone with a motto of >>>>> read "wrong" write "correct" for things like this in the past and >>>>> continuing to write "wrong" is a disservice to any >>>>> new implementers . When we do have a contradiction between our >>>>> implementation and the spec I think we have >>>>> to trust that implementers followed the spec and fix the core library. >>>>> >>>>> Are there any other solutions here? >>>>> >>>>> On Mon, Feb 17, 2025 at 11:45 AM Fokko Driesprong <fo...@apache.org> >>>>> wrote: >>>>> >>>>>> Hey Robert, >>>>>> >>>>>> The thing is, that -1 cannot "go away". >>>>>> >>>>>> >>>>>> Yes, I agree, but that's also the case for null, as the field is optional >>>>>> in the spec >>>>>> <https://iceberg.apache.org/spec/?column-projection#table-metadata-fields>. >>>>>> Therefore we support both in PyIceberg >>>>>> <https://github.com/apache/iceberg-python/blob/300b8405a0fe7d0111321e5644d704026af9266b/pyiceberg/table/metadata.py#L71-L77>, >>>>>> Iceberg-Rust >>>>>> <https://github.com/apache/iceberg-rust/blob/752d69041e0461989c48dd1ca79bcff577776f5d/crates/iceberg/src/spec/table_metadata.rs#L500>, >>>>>> Iceberg-Java >>>>>> <https://github.com/apache/iceberg/blob/bcbbd0344623ffea5b092e2de5debb0bc12892a1/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L458-L462>, >>>>>> and Iceberg-Go >>>>>> <https://github.com/apache/iceberg-go/blob/ada5480954d9b41d2f8eb4c765523614fad65e1a/table/metadata.go#L837-L841>. >>>>>> On the write side, they all produce null instead of -1. Therefore, I >>>>>> was surprised that it comes up now, and not earlier. >>>>>> >>>>>> I'd prefer to keep the previous behavior - otherwise implementations >>>>>>> may fall back to 0, which is definitely wrong. >>>>>> >>>>>> >>>>>> I'm not seeing why it would fall back to 0, and I agree, that's wrong. >>>>>> >>>>>> Would be better IMHO not to break existing implementations / render >>>>>>> existing setups incompatible with Iceberg 1.8. >>>>>> >>>>>> >>>>>> In my opinion, if this had been caught in an RC, it would be open for >>>>>> discussion, but that ship has sailed. Let's hear what others think. >>>>>> >>>>>> Kind regards, >>>>>> Fokko >>>>>> >>>>>> Op ma 17 feb 2025 om 18:16 schreef Robert Stupp <sn...@snazy.de>: >>>>>> >>>>>>> Hi Fokko, >>>>>>> >>>>>>> sure, in general "absent" or "null" would be cleaner. But now we >>>>>>> have two representations for the same case - I suspect most went with >>>>>>> the >>>>>>> "reference behavior". >>>>>>> >>>>>>> The thing is, that -1 cannot "go away". >>>>>>> >>>>>>> I'd prefer to keep the previous behavior - otherwise implementations >>>>>>> may fall back to 0, which is definitely wrong. Would be better IMHO not >>>>>>> to >>>>>>> break existing implementations / render existing setups incompatible >>>>>>> with >>>>>>> Iceberg 1.8. >>>>>>> >>>>>>> >>>>>>> On 17.02.25 15:49, Fokko Driesprong wrote: >>>>>>> >>>>>>> Hey Robert, >>>>>>> >>>>>>> Thanks for raising this. >>>>>>> >>>>>>> snapshot-ID -1 isn't per-se invalid, because the valid values are >>>>>>>> not defined in the spec. >>>>>>> >>>>>>> >>>>>>> For me, this is invalid, since there is no snapshot with -1 in the >>>>>>> snapshots property. In the tests with the PR, you can see that >>>>>>> there are no snapshots >>>>>>> <https://github.com/apache/iceberg/pull/11560/files#diff-41bdfb6698d2aa7b47ff7d5fabc558a5a64f8b7496fe1bcd8f8ecb69b2afc128R112>. >>>>>>> A year ago we had a similar discussion on PyIceberg >>>>>>> <https://py.iceberg.apache.org/configuration/#backward-compatibility> >>>>>>> around this and this ended up in adding a flag to fall back to the old >>>>>>> behavior. I do agree that we should have communicated this more clearly >>>>>>> with the release. >>>>>>> >>>>>>> Kind regards, >>>>>>> Fokko >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Op ma 17 feb 2025 om 12:25 schreef Robert Stupp <sn...@snazy.de>: >>>>>>> >>>>>>>> Feels like https://github.com/apache/iceberg/pull/11560 introduced >>>>>>>> a behavior change. >>>>>>>> >>>>>>>> snapshot-ID -1 isn't per-se invalid, because the valid values are >>>>>>>> not defined in the spec. >>>>>>>> >>>>>>>> Previous Iceberg-Java versions always produced -1 if there's no >>>>>>>> current snapshot - 1.8 produces `null` in that case. So there are now >>>>>>>> two >>>>>>>> _different_ values for "no current snapshot". >>>>>>>> >>>>>>>> Implementations that rely on the behavior of the "reference >>>>>>>> implementation" (Iceberg-Java) do now fail in case there's no current >>>>>>>> snapshot. >>>>>>>> On 13.02.25 10:09, Amogh Jahagirdar wrote: >>>>>>>> >>>>>>>> I'm pleased to announce the release of Apache Iceberg 1.8.0! >>>>>>>> >>>>>>>> Apache Iceberg is an open table format for huge analytic datasets. >>>>>>>> Iceberg >>>>>>>> delivers high query performance for tables with tens of petabytes >>>>>>>> of data, >>>>>>>> along with atomic commits, concurrent writes, and SQL-compatible >>>>>>>> table >>>>>>>> evolution. >>>>>>>> >>>>>>>> This release can be downloaded from: >>>>>>>> https://www.apache.org/dyn/closer.cgi/iceberg/apache-iceberg-1.8.0/apache-iceberg-1.8.0.tar.gz >>>>>>>> >>>>>>>> >>>>>>>> Release notes: https://iceberg.apache.org/releases/#180-release >>>>>>>> <https://iceberg.apache.org/releases/180-release> >>>>>>>> >>>>>>>> Java artifacts are available from Maven Central. >>>>>>>> >>>>>>>> Thanks to everyone for contributing! >>>>>>>> >>>>>>>> -- >>>>>>>> Robert Stupp >>>>>>>> @snazy >>>>>>>> >>>>>>>> -- >>>>>>> Robert Stupp >>>>>>> @snazy >>>>>>> >>>>>>> -- >>>>> Robert Stupp >>>>> @snazy >>>>> >>>>> -- >>>>> Robert Stupp >>>>> @snazy >>>>> >>>>>