+1 to reverting PT 11560 in main and 1.8.1. That avoids unnecessary incompatibility with older readers.
I also agree that we should update the spec to say what Russell suggests: > that -1 has meant "no current snapshot" in the past and is equivalent to missing/null. That's a correct description of the behavior. I don't think that we should create new requirements for this that didn't previously exist -- that is, I don't see much value in going back to mandate this behavior when writing v1 or v2 tables. If an implementation were writing null for the current snapshot ID up until now, I don't think that we can say that behavior was or is incorrect. It was an incompatibility with the Java implementation and we should note the behavior in the "Implementation Notes" section. > How about reverting #11560 for 1.8.1, and then reinstating this for 2.0.0? I think we need to fix this at a format version boundary, not a library version boundary. I'd be up for reinstating the write change for v3. On Tue, Feb 18, 2025 at 7:56 AM Russell Spitzer <russell.spit...@gmail.com> wrote: > The only thing I think I agree with is defining that -1 has meant "no > current snapshot" in the past and > is equivalent to a missing/nuil (if we have to specify that) . > > I don't think there is any reason to change the behavior of writing null / > missing unless > that's really a point of confusion for folks. Is there a JSON library > folks are using > which distinguishes between missing/null? > > Would having field:null be the same as missing have avoided the issue > because it seems like libraries > that only handled non-null would also not handle "missing" > current-snapshot-id as well? > > I'd really like to hear from other implementers here since changing all > this again would be a lot of > work and I thought we had understanding of Optional == Nullable as a valid > thing in the spec. > > On Tue, Feb 18, 2025 at 7:35 AM Robert Stupp <sn...@snazy.de> wrote: > >> Correcting myself: schema/spec/sort seem to be always present - please >> ignore that part in my previous email. The valid values for those fields >> however should be defined. >> On 18.02.25 14:29, Robert Stupp wrote: >> >> Reality is that Iceberg did write '-1' into current-snapshot-id (and >> other "non-exist" marker values for schema/spec/sort) instead of omitting >> the field. >> >> Side note: the table-spec says that these fields are optional, but >> nothing about whether it is nullable. >> >> The spec should at least be amended to explicitly define the valid values >> (-1 for current-snapshot-id for no snapshot is just a fact now that it's >> there). IMO that field being 'null' isn't defined in the spec, but the >> absence of the field is. >> >> Proposal for the implementation: >> >> * revert https://github.com/apache/iceberg/pull/11560 in 1.8.1 and on >> main >> >> Proposal for the spec (for current-snapshot-id): >> >> * Define the valid value ranges for the field >> * Define that the absence of the field means "no current snapshot" >> * Define that the value -1 of the field means "no current snapshot" >> * Define that the value 'null' of the field means "no current snapshot" >> * Define that new implementations must not write the field if there's no >> current snapshot >> Proposal for the spec (for schema/spec/sort): >> >> * Define the valid value ranges for the fields >> * Define the "schema/spec/sort not present" values (the fields are >> optional for v1 but required for v2+v3). >> * OR Define that "schema/spec/sort must be absent" if there is no current >> schema/spec/sort. >> >> WDYT? >> >> On 17.02.25 21:07, Russell Spitzer wrote: >> >> It sounds like the argument here is that we should change the Spec for >> V1, V2, and V3 to mark current-snapshot-id >> as required. Then we should change all other implementations to follow >> this new standard. I'm not sure that >> is a good solution going forwards but I'm not sure of how we can support >> catalogs/engines that cannot handle a null >> correctly in this situation otherwise. Perhaps we should source out to >> see if any other implementers worked off the >> assumption of a non-optional "current-snapshot-id" and if we get a >> critical mass we can try to make that change? >> Because of how wide that change would be, I think we would need pretty >> broad consensus to do so. >> >> We could possibly also have a flag to allow the old behavior but that >> also feels wrong to me, we have often gone with a motto of >> read "wrong" write "correct" for things like this in the past and >> continuing to write "wrong" is a disservice to any >> new implementers . When we do have a contradiction between our >> implementation and the spec I think we have >> to trust that implementers followed the spec and fix the core library. >> >> Are there any other solutions here? >> >> On Mon, Feb 17, 2025 at 11:45 AM Fokko Driesprong <fo...@apache.org> >> wrote: >> >>> Hey Robert, >>> >>> The thing is, that -1 cannot "go away". >>> >>> >>> Yes, I agree, but that's also the case for null, as the field is optional >>> in the spec >>> <https://iceberg.apache.org/spec/?column-projection#table-metadata-fields>. >>> Therefore we support both in PyIceberg >>> <https://github.com/apache/iceberg-python/blob/300b8405a0fe7d0111321e5644d704026af9266b/pyiceberg/table/metadata.py#L71-L77>, >>> Iceberg-Rust >>> <https://github.com/apache/iceberg-rust/blob/752d69041e0461989c48dd1ca79bcff577776f5d/crates/iceberg/src/spec/table_metadata.rs#L500>, >>> Iceberg-Java >>> <https://github.com/apache/iceberg/blob/bcbbd0344623ffea5b092e2de5debb0bc12892a1/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L458-L462>, >>> and Iceberg-Go >>> <https://github.com/apache/iceberg-go/blob/ada5480954d9b41d2f8eb4c765523614fad65e1a/table/metadata.go#L837-L841>. >>> On the write side, they all produce null instead of -1. Therefore, I was >>> surprised that >>> it comes up now, and not earlier. >>> >>> I'd prefer to keep the previous behavior - otherwise implementations may >>>> fall back to 0, which is definitely wrong. >>> >>> >>> I'm not seeing why it would fall back to 0, and I agree, that's wrong. >>> >>> Would be better IMHO not to break existing implementations / render >>>> existing setups incompatible with Iceberg 1.8. >>> >>> >>> In my opinion, if this had been caught in an RC, it would be open for >>> discussion, but that ship has sailed. Let's hear what others think. >>> >>> Kind regards, >>> Fokko >>> >>> Op ma 17 feb 2025 om 18:16 schreef Robert Stupp <sn...@snazy.de>: >>> >>>> Hi Fokko, >>>> >>>> sure, in general "absent" or "null" would be cleaner. But now we have >>>> two representations for the same case - I suspect most went with the >>>> "reference behavior". >>>> >>>> The thing is, that -1 cannot "go away". >>>> >>>> I'd prefer to keep the previous behavior - otherwise implementations >>>> may fall back to 0, which is definitely wrong. Would be better IMHO not to >>>> break existing implementations / render existing setups incompatible with >>>> Iceberg 1.8. >>>> >>>> >>>> On 17.02.25 15:49, Fokko Driesprong wrote: >>>> >>>> Hey Robert, >>>> >>>> Thanks for raising this. >>>> >>>> snapshot-ID -1 isn't per-se invalid, because the valid values are not >>>>> defined in the spec. >>>> >>>> >>>> For me, this is invalid, since there is no snapshot with -1 in the >>>> snapshots property. In the tests with the PR, you can see that there >>>> are no snapshots >>>> <https://github.com/apache/iceberg/pull/11560/files#diff-41bdfb6698d2aa7b47ff7d5fabc558a5a64f8b7496fe1bcd8f8ecb69b2afc128R112>. >>>> A year ago we had a similar discussion on PyIceberg >>>> <https://py.iceberg.apache.org/configuration/#backward-compatibility> >>>> around this and this ended up in adding a flag to fall back to the old >>>> behavior. I do agree that we should have communicated this more clearly >>>> with the release. >>>> >>>> Kind regards, >>>> Fokko >>>> >>>> >>>> >>>> >>>> Op ma 17 feb 2025 om 12:25 schreef Robert Stupp <sn...@snazy.de>: >>>> >>>>> Feels like https://github.com/apache/iceberg/pull/11560 introduced a >>>>> behavior change. >>>>> >>>>> snapshot-ID -1 isn't per-se invalid, because the valid values are not >>>>> defined in the spec. >>>>> >>>>> Previous Iceberg-Java versions always produced -1 if there's no >>>>> current snapshot - 1.8 produces `null` in that case. So there are now two >>>>> _different_ values for "no current snapshot". >>>>> >>>>> Implementations that rely on the behavior of the "reference >>>>> implementation" (Iceberg-Java) do now fail in case there's no current >>>>> snapshot. >>>>> On 13.02.25 10:09, Amogh Jahagirdar wrote: >>>>> >>>>> I'm pleased to announce the release of Apache Iceberg 1.8.0! >>>>> >>>>> Apache Iceberg is an open table format for huge analytic datasets. >>>>> Iceberg >>>>> delivers high query performance for tables with tens of petabytes of >>>>> data, >>>>> along with atomic commits, concurrent writes, and SQL-compatible table >>>>> evolution. >>>>> >>>>> This release can be downloaded from: >>>>> https://www.apache.org/dyn/closer.cgi/iceberg/apache-iceberg-1.8.0/apache-iceberg-1.8.0.tar.gz >>>>> >>>>> >>>>> Release notes: https://iceberg.apache.org/releases/#180-release >>>>> <https://iceberg.apache.org/releases/180-release> >>>>> >>>>> Java artifacts are available from Maven Central. >>>>> >>>>> Thanks to everyone for contributing! >>>>> >>>>> -- >>>>> Robert Stupp >>>>> @snazy >>>>> >>>>> -- >>>> Robert Stupp >>>> @snazy >>>> >>>> -- >> Robert Stupp >> @snazy >> >> -- >> Robert Stupp >> @snazy >> >>