Hi, I agree with Fokko's sum-up and call for action.
Regards JB On Wed, Feb 19, 2025 at 11:34 AM Fokko Driesprong <fo...@apache.org> wrote: > > Thanks for the historical context here. I wasn't aware of the equivalent > between -1 and null. > > It looks like we're all aligned on not retroactively changing the spec. Let's > wrap this up with some actions: > > To revert the changes, I've raised the PRs (1.8.x, main) and they are already > in (thanks Eduard!). > Created a PR to write null for ≥V3. I like the solution of tying it to the > metadata version, since folks have to update their libraries/catalogs anyway > to use V3. > Distilled the words said here around -1 into an implementation note to avoid > confusion in the future, let me know what y'all think. > > Thanks again everyone for the input here. > > Kind regards, > Fokko > > > > > > > Op di 18 feb 2025 om 21:55 schreef Daniel Weeks <dwe...@apache.org>: >> >> I agree that we shouldn't retroactively change the spec to align with the >> implementation. >> >> We're trying to strike a balance here and I think the notes are an effective >> way to convey that while omitting or producing null with v1/v2 is >> technically to spec, it's going to be incompatible with most implementations >> and should be discouraged for compatibility reasons. >> >> So, no on spec change, yes on notes. >> >> On Tue, Feb 18, 2025 at 12:37 PM Russell Spitzer <russell.spit...@gmail.com> >> wrote: >>> >>> `For compatibility with existing libraries, we should maintain that `-1` is >>> equivalent to no snapshot and it should be written for v1/v2.` >>> >>> The only issue I have with this is are we saying that for v1 and v2 we are >>> changing the spec to say that current-snapshot-id is required? Or are we >>> adding an implementation note, the java implementation will always return >>> -1 for these tables but this is not required for other clients? >>> >>> Unless we say that it is required we can have other implementations which >>> can return null or missing. I'm not a big fan of changing the spec to match >>> the implementation especially when we are narrowing what is allowed. >>> >>> >>> On Tue, Feb 18, 2025 at 2:15 PM Daniel Weeks <dwe...@apache.org> wrote: >>>> >>>> I would agree the best path forward is to note the current behavior for >>>> v1/v2 since that's well established and address the behavior in v3. >>>> >>>> For compatibility with existing libraries, we should maintain that `-1` is >>>> equivalent to no snapshot and it should be written for v1/v2. >>>> >>>> With V3 we should support `-1` for backwards compatibility (though there's >>>> low likelihood that someone will promote an empty table), but always write >>>> `null`. >>>> >>>> I believe this is in part due to TableMetadata tracking the >>>> current-snapshot-id as a `long` value (as opposed to a Long) reference, >>>> which resulted in this oversight. >>>> >>>> -Dan >>>> >>>> On Tue, Feb 18, 2025 at 11:57 AM rdb...@gmail.com <rdb...@gmail.com> wrote: >>>>> >>>>> +1 to reverting PT 11560 in main and 1.8.1. That avoids unnecessary >>>>> incompatibility with older readers. >>>>> >>>>> I also agree that we should update the spec to say what Russell suggests: >>>>> > that -1 has meant "no current snapshot" in the past and is equivalent >>>>> > to missing/null. >>>>> >>>>> That's a correct description of the behavior. >>>>> >>>>> I don't think that we should create new requirements for this that didn't >>>>> previously exist -- that is, I don't see much value in going back to >>>>> mandate this behavior when writing v1 or v2 tables. If an implementation >>>>> were writing null for the current snapshot ID up until now, I don't think >>>>> that we can say that behavior was or is incorrect. It was an >>>>> incompatibility with the Java implementation and we should note the >>>>> behavior in the "Implementation Notes" section. >>>>> >>>>> > How about reverting #11560 for 1.8.1, and then reinstating this for >>>>> > 2.0.0? >>>>> >>>>> I think we need to fix this at a format version boundary, not a library >>>>> version boundary. I'd be up for reinstating the write change for v3. >>>>> >>>>> On Tue, Feb 18, 2025 at 7:56 AM Russell Spitzer >>>>> <russell.spit...@gmail.com> wrote: >>>>>> >>>>>> The only thing I think I agree with is defining that -1 has meant "no >>>>>> current snapshot" in the past and >>>>>> is equivalent to a missing/nuil (if we have to specify that) . >>>>>> >>>>>> I don't think there is any reason to change the behavior of writing null >>>>>> / missing unless >>>>>> that's really a point of confusion for folks. Is there a JSON library >>>>>> folks are using >>>>>> which distinguishes between missing/null? >>>>>> >>>>>> Would having field:null be the same as missing have avoided the issue >>>>>> because it seems like libraries >>>>>> that only handled non-null would also not handle "missing" >>>>>> current-snapshot-id as well? >>>>>> >>>>>> I'd really like to hear from other implementers here since changing all >>>>>> this again would be a lot of >>>>>> work and I thought we had understanding of Optional == Nullable as a >>>>>> valid thing in the spec. >>>>>> >>>>>> On Tue, Feb 18, 2025 at 7:35 AM Robert Stupp <sn...@snazy.de> wrote: >>>>>>> >>>>>>> Correcting myself: schema/spec/sort seem to be always present - please >>>>>>> ignore that part in my previous email. The valid values for those >>>>>>> fields however should be defined. >>>>>>> >>>>>>> On 18.02.25 14:29, Robert Stupp wrote: >>>>>>> >>>>>>> Reality is that Iceberg did write '-1' into current-snapshot-id (and >>>>>>> other "non-exist" marker values for schema/spec/sort) instead of >>>>>>> omitting the field. >>>>>>> >>>>>>> Side note: the table-spec says that these fields are optional, but >>>>>>> nothing about whether it is nullable. >>>>>>> >>>>>>> The spec should at least be amended to explicitly define the valid >>>>>>> values (-1 for current-snapshot-id for no snapshot is just a fact now >>>>>>> that it's there). IMO that field being 'null' isn't defined in the >>>>>>> spec, but the absence of the field is. >>>>>>> >>>>>>> Proposal for the implementation: >>>>>>> >>>>>>> * revert https://github.com/apache/iceberg/pull/11560 in 1.8.1 and on >>>>>>> main >>>>>>> >>>>>>> Proposal for the spec (for current-snapshot-id): >>>>>>> >>>>>>> * Define the valid value ranges for the field >>>>>>> * Define that the absence of the field means "no current snapshot" >>>>>>> * Define that the value -1 of the field means "no current snapshot" >>>>>>> * Define that the value 'null' of the field means "no current snapshot" >>>>>>> * Define that new implementations must not write the field if there's >>>>>>> no current snapshot >>>>>>> >>>>>>> Proposal for the spec (for schema/spec/sort): >>>>>>> >>>>>>> * Define the valid value ranges for the fields >>>>>>> * Define the "schema/spec/sort not present" values (the fields are >>>>>>> optional for v1 but required for v2+v3). >>>>>>> * OR Define that "schema/spec/sort must be absent" if there is no >>>>>>> current schema/spec/sort. >>>>>>> >>>>>>> WDYT? >>>>>>> >>>>>>> >>>>>>> On 17.02.25 21:07, Russell Spitzer wrote: >>>>>>> >>>>>>> It sounds like the argument here is that we should change the Spec for >>>>>>> V1, V2, and V3 to mark current-snapshot-id >>>>>>> as required. Then we should change all other implementations to follow >>>>>>> this new standard. I'm not sure that >>>>>>> is a good solution going forwards but I'm not sure of how we can >>>>>>> support catalogs/engines that cannot handle a null >>>>>>> correctly in this situation otherwise. Perhaps we should source out to >>>>>>> see if any other implementers worked off the >>>>>>> assumption of a non-optional "current-snapshot-id" and if we get a >>>>>>> critical mass we can try to make that change? >>>>>>> Because of how wide that change would be, I think we would need pretty >>>>>>> broad consensus to do so. >>>>>>> >>>>>>> We could possibly also have a flag to allow the old behavior but that >>>>>>> also feels wrong to me, we have often gone with a motto of >>>>>>> read "wrong" write "correct" for things like this in the past and >>>>>>> continuing to write "wrong" is a disservice to any >>>>>>> new implementers . When we do have a contradiction between our >>>>>>> implementation and the spec I think we have >>>>>>> to trust that implementers followed the spec and fix the core library. >>>>>>> >>>>>>> Are there any other solutions here? >>>>>>> >>>>>>> On Mon, Feb 17, 2025 at 11:45 AM Fokko Driesprong <fo...@apache.org> >>>>>>> wrote: >>>>>>>> >>>>>>>> Hey Robert, >>>>>>>> >>>>>>>>> The thing is, that -1 cannot "go away". >>>>>>>> >>>>>>>> >>>>>>>> Yes, I agree, but that's also the case for null, as the field is >>>>>>>> optional in the spec. Therefore we support both in PyIceberg, >>>>>>>> Iceberg-Rust, Iceberg-Java, and Iceberg-Go. On the write side, they >>>>>>>> all produce null instead of -1. Therefore, I was surprised that it >>>>>>>> comes up now, and not earlier. >>>>>>>> >>>>>>>>> I'd prefer to keep the previous behavior - otherwise implementations >>>>>>>>> may fall back to 0, which is definitely wrong. >>>>>>>> >>>>>>>> >>>>>>>> I'm not seeing why it would fall back to 0, and I agree, that's wrong. >>>>>>>> >>>>>>>>> Would be better IMHO not to break existing implementations / render >>>>>>>>> existing setups incompatible with Iceberg 1.8. >>>>>>>> >>>>>>>> >>>>>>>> In my opinion, if this had been caught in an RC, it would be open for >>>>>>>> discussion, but that ship has sailed. Let's hear what others think. >>>>>>>> >>>>>>>> Kind regards, >>>>>>>> Fokko >>>>>>>> >>>>>>>> Op ma 17 feb 2025 om 18:16 schreef Robert Stupp <sn...@snazy.de>: >>>>>>>>> >>>>>>>>> Hi Fokko, >>>>>>>>> >>>>>>>>> sure, in general "absent" or "null" would be cleaner. But now we have >>>>>>>>> two representations for the same case - I suspect most went with the >>>>>>>>> "reference behavior". >>>>>>>>> >>>>>>>>> The thing is, that -1 cannot "go away". >>>>>>>>> >>>>>>>>> I'd prefer to keep the previous behavior - otherwise implementations >>>>>>>>> may fall back to 0, which is definitely wrong. Would be better IMHO >>>>>>>>> not to break existing implementations / render existing setups >>>>>>>>> incompatible with Iceberg 1.8. >>>>>>>>> >>>>>>>>> >>>>>>>>> On 17.02.25 15:49, Fokko Driesprong wrote: >>>>>>>>> >>>>>>>>> Hey Robert, >>>>>>>>> >>>>>>>>> Thanks for raising this. >>>>>>>>> >>>>>>>>>> snapshot-ID -1 isn't per-se invalid, because the valid values are >>>>>>>>>> not defined in the spec. >>>>>>>>> >>>>>>>>> >>>>>>>>> For me, this is invalid, since there is no snapshot with -1 in the >>>>>>>>> snapshots property. In the tests with the PR, you can see that there >>>>>>>>> are no snapshots. A year ago we had a similar discussion on PyIceberg >>>>>>>>> around this and this ended up in adding a flag to fall back to the >>>>>>>>> old behavior. I do agree that we should have communicated this more >>>>>>>>> clearly with the release. >>>>>>>>> >>>>>>>>> Kind regards, >>>>>>>>> Fokko >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Op ma 17 feb 2025 om 12:25 schreef Robert Stupp <sn...@snazy.de>: >>>>>>>>>> >>>>>>>>>> Feels like https://github.com/apache/iceberg/pull/11560 introduced a >>>>>>>>>> behavior change. >>>>>>>>>> >>>>>>>>>> snapshot-ID -1 isn't per-se invalid, because the valid values are >>>>>>>>>> not defined in the spec. >>>>>>>>>> >>>>>>>>>> Previous Iceberg-Java versions always produced -1 if there's no >>>>>>>>>> current snapshot - 1.8 produces `null` in that case. So there are >>>>>>>>>> now two _different_ values for "no current snapshot". >>>>>>>>>> >>>>>>>>>> Implementations that rely on the behavior of the "reference >>>>>>>>>> implementation" (Iceberg-Java) do now fail in case there's no >>>>>>>>>> current snapshot. >>>>>>>>>> >>>>>>>>>> On 13.02.25 10:09, Amogh Jahagirdar wrote: >>>>>>>>>> >>>>>>>>>> I'm pleased to announce the release of Apache Iceberg 1.8.0! >>>>>>>>>> >>>>>>>>>> Apache Iceberg is an open table format for huge analytic datasets. >>>>>>>>>> Iceberg >>>>>>>>>> delivers high query performance for tables with tens of petabytes of >>>>>>>>>> data, >>>>>>>>>> along with atomic commits, concurrent writes, and SQL-compatible >>>>>>>>>> table >>>>>>>>>> evolution. >>>>>>>>>> >>>>>>>>>> This release can be downloaded from: >>>>>>>>>> https://www.apache.org/dyn/closer.cgi/iceberg/apache-iceberg-1.8.0/apache-iceberg-1.8.0.tar.gz >>>>>>>>>> >>>>>>>>>> Release notes: https://iceberg.apache.org/releases/#180-release >>>>>>>>>> >>>>>>>>>> Java artifacts are available from Maven Central. >>>>>>>>>> >>>>>>>>>> Thanks to everyone for contributing! >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Robert Stupp >>>>>>>>>> @snazy >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Robert Stupp >>>>>>>>> @snazy >>>>>>> >>>>>>> -- >>>>>>> Robert Stupp >>>>>>> @snazy >>>>>>> >>>>>>> -- >>>>>>> Robert Stupp >>>>>>> @snazy