Hi,

I agree with Fokko's sum-up and call for action.

Regards
JB

On Wed, Feb 19, 2025 at 11:34 AM Fokko Driesprong <fo...@apache.org> wrote:
>
> Thanks for the historical context here. I wasn't aware of the equivalent 
> between -1 and null.
>
> It looks like we're all aligned on not retroactively changing the spec. Let's 
> wrap this up with some actions:
>
> To revert the changes, I've raised the PRs (1.8.x, main) and they are already 
> in (thanks Eduard!).
> Created a PR to write null for ≥V3. I like the solution of tying it to the 
> metadata version, since folks have to update their libraries/catalogs anyway 
> to use V3.
> Distilled the words said here around -1 into an implementation note to avoid 
> confusion in the future, let me know what y'all think.
>
> Thanks again everyone for the input here.
>
> Kind regards,
> Fokko
>
>
>
>
>
>
> Op di 18 feb 2025 om 21:55 schreef Daniel Weeks <dwe...@apache.org>:
>>
>> I agree that we shouldn't retroactively change the spec to align with the 
>> implementation.
>>
>> We're trying to strike a balance here and I think the notes are an effective 
>> way to convey that while omitting or producing null with v1/v2 is 
>> technically to spec, it's going to be incompatible with most implementations 
>> and should be discouraged for compatibility reasons.
>>
>> So, no on spec change, yes on notes.
>>
>> On Tue, Feb 18, 2025 at 12:37 PM Russell Spitzer <russell.spit...@gmail.com> 
>> wrote:
>>>
>>> `For compatibility with existing libraries, we should maintain that `-1` is 
>>> equivalent to no snapshot and it should be written for v1/v2.`
>>>
>>> The only issue I have with this is are we saying that for v1 and v2 we are 
>>> changing the spec to say that current-snapshot-id is required? Or are we 
>>> adding an implementation note, the java implementation will always return 
>>> -1 for these tables but this is not required for other clients?
>>>
>>> Unless we say that it is required we can have other implementations which 
>>> can return null or missing. I'm not a big fan of changing the spec to match 
>>> the implementation especially when we are narrowing what is allowed.
>>>
>>>
>>> On Tue, Feb 18, 2025 at 2:15 PM Daniel Weeks <dwe...@apache.org> wrote:
>>>>
>>>> I would agree the best path forward is to note the current behavior for 
>>>> v1/v2 since that's well established and address the behavior in v3.
>>>>
>>>> For compatibility with existing libraries, we should maintain that `-1` is 
>>>> equivalent to no snapshot and it should be written for v1/v2.
>>>>
>>>> With V3 we should support `-1` for backwards compatibility (though there's 
>>>> low likelihood that someone will promote an empty table), but always write 
>>>> `null`.
>>>>
>>>> I believe this is in part due to TableMetadata tracking the 
>>>> current-snapshot-id as a `long` value (as opposed to a Long) reference, 
>>>> which resulted in this oversight.
>>>>
>>>> -Dan
>>>>
>>>> On Tue, Feb 18, 2025 at 11:57 AM rdb...@gmail.com <rdb...@gmail.com> wrote:
>>>>>
>>>>> +1 to reverting PT 11560 in main and 1.8.1. That avoids unnecessary 
>>>>> incompatibility with older readers.
>>>>>
>>>>> I also agree that we should update the spec to say what Russell suggests:
>>>>> > that -1 has meant "no current snapshot" in the past and is equivalent 
>>>>> > to missing/null.
>>>>>
>>>>> That's a correct description of the behavior.
>>>>>
>>>>> I don't think that we should create new requirements for this that didn't 
>>>>> previously exist -- that is, I don't see much value in going back to 
>>>>> mandate this behavior when writing v1 or v2 tables. If an implementation 
>>>>> were writing null for the current snapshot ID up until now, I don't think 
>>>>> that we can say that behavior was or is incorrect. It was an 
>>>>> incompatibility with the Java implementation and we should note the 
>>>>> behavior in the "Implementation Notes" section.
>>>>>
>>>>> > How about reverting #11560 for 1.8.1, and then reinstating this for 
>>>>> > 2.0.0?
>>>>>
>>>>> I think we need to fix this at a format version boundary, not a library 
>>>>> version boundary. I'd be up for reinstating the write change for v3.
>>>>>
>>>>> On Tue, Feb 18, 2025 at 7:56 AM Russell Spitzer 
>>>>> <russell.spit...@gmail.com> wrote:
>>>>>>
>>>>>> The only thing I think I agree with is defining that -1 has meant "no 
>>>>>> current snapshot" in the past and
>>>>>> is equivalent to a missing/nuil (if we have to specify that) .
>>>>>>
>>>>>> I don't think there is any reason to change the behavior of writing null 
>>>>>> / missing unless
>>>>>> that's really a point of confusion for folks. Is there a JSON library 
>>>>>> folks are using
>>>>>>  which distinguishes between missing/null?
>>>>>>
>>>>>> Would having field:null be the same as missing have avoided the issue 
>>>>>> because it seems like libraries
>>>>>> that only handled non-null would also not handle "missing" 
>>>>>> current-snapshot-id as well?
>>>>>>
>>>>>> I'd really like to hear from other implementers here since changing all 
>>>>>> this again would be a lot of
>>>>>> work and I thought we had understanding of Optional == Nullable as a 
>>>>>> valid thing in the spec.
>>>>>>
>>>>>> On Tue, Feb 18, 2025 at 7:35 AM Robert Stupp <sn...@snazy.de> wrote:
>>>>>>>
>>>>>>> Correcting myself: schema/spec/sort seem to be always present - please 
>>>>>>> ignore that part in my previous email. The valid values for those 
>>>>>>> fields however should be defined.
>>>>>>>
>>>>>>> On 18.02.25 14:29, Robert Stupp wrote:
>>>>>>>
>>>>>>> Reality is that Iceberg did write '-1' into current-snapshot-id (and 
>>>>>>> other "non-exist" marker values for schema/spec/sort) instead of 
>>>>>>> omitting the field.
>>>>>>>
>>>>>>> Side note: the table-spec says that these fields are optional, but 
>>>>>>> nothing about whether it is nullable.
>>>>>>>
>>>>>>> The spec should at least be amended to explicitly define the valid 
>>>>>>> values (-1 for current-snapshot-id for no snapshot is just a fact now 
>>>>>>> that it's there). IMO that field being 'null' isn't defined in the 
>>>>>>> spec, but the absence of the field is.
>>>>>>>
>>>>>>> Proposal for the implementation:
>>>>>>>
>>>>>>> * revert https://github.com/apache/iceberg/pull/11560 in 1.8.1 and on 
>>>>>>> main
>>>>>>>
>>>>>>> Proposal for the spec (for current-snapshot-id):
>>>>>>>
>>>>>>> * Define the valid value ranges for the field
>>>>>>> * Define that the absence of the field means "no current snapshot"
>>>>>>> * Define that the value -1 of the field means "no current snapshot"
>>>>>>> * Define that the value 'null' of the field means "no current snapshot"
>>>>>>> * Define that new implementations must not write the field if there's 
>>>>>>> no current snapshot
>>>>>>>
>>>>>>> Proposal for the spec (for schema/spec/sort):
>>>>>>>
>>>>>>> * Define the valid value ranges for the fields
>>>>>>> * Define the "schema/spec/sort not present" values (the fields are 
>>>>>>> optional for v1 but required for v2+v3).
>>>>>>> * OR Define that "schema/spec/sort must be absent" if there is no 
>>>>>>> current schema/spec/sort.
>>>>>>>
>>>>>>> WDYT?
>>>>>>>
>>>>>>>
>>>>>>> On 17.02.25 21:07, Russell Spitzer wrote:
>>>>>>>
>>>>>>> It sounds like the argument here is that we should change the Spec for 
>>>>>>> V1, V2, and V3 to mark current-snapshot-id
>>>>>>> as required. Then we should change all other implementations to follow 
>>>>>>> this new standard. I'm not sure that
>>>>>>> is a good solution going forwards but I'm not sure of how we can 
>>>>>>> support catalogs/engines that cannot handle a null
>>>>>>> correctly in this situation otherwise. Perhaps we should source out to 
>>>>>>> see if any other implementers worked off the
>>>>>>> assumption of a non-optional "current-snapshot-id" and if we get a 
>>>>>>> critical mass we can try to make that change?
>>>>>>> Because of how wide that change would be, I think we would need pretty 
>>>>>>> broad consensus to do so.
>>>>>>>
>>>>>>> We could possibly also have a flag to allow the old behavior but that 
>>>>>>> also feels wrong to me, we have often gone with a motto of
>>>>>>> read "wrong" write "correct" for things like this in the past and 
>>>>>>> continuing to write "wrong" is a disservice to any
>>>>>>> new implementers . When we do have a contradiction between our 
>>>>>>> implementation and the spec I think we have
>>>>>>> to trust that implementers followed the spec and fix the core library.
>>>>>>>
>>>>>>> Are there any other solutions here?
>>>>>>>
>>>>>>> On Mon, Feb 17, 2025 at 11:45 AM Fokko Driesprong <fo...@apache.org> 
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hey Robert,
>>>>>>>>
>>>>>>>>> The thing is, that -1 cannot "go away".
>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, I agree, but that's also the case for null, as the field is 
>>>>>>>> optional in the spec. Therefore we support both in PyIceberg, 
>>>>>>>> Iceberg-Rust, Iceberg-Java, and Iceberg-Go. On the write side, they 
>>>>>>>> all produce null instead of -1. Therefore, I was surprised that it 
>>>>>>>> comes up now, and not earlier.
>>>>>>>>
>>>>>>>>> I'd prefer to keep the previous behavior - otherwise implementations 
>>>>>>>>> may fall back to 0, which is definitely wrong.
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm not seeing why it would fall back to 0, and I agree, that's wrong.
>>>>>>>>
>>>>>>>>> Would be better IMHO not to break existing implementations / render 
>>>>>>>>> existing setups incompatible with Iceberg 1.8.
>>>>>>>>
>>>>>>>>
>>>>>>>> In my opinion, if this had been caught in an RC, it would be open for 
>>>>>>>> discussion, but that ship has sailed. Let's hear what others think.
>>>>>>>>
>>>>>>>> Kind regards,
>>>>>>>> Fokko
>>>>>>>>
>>>>>>>> Op ma 17 feb 2025 om 18:16 schreef Robert Stupp <sn...@snazy.de>:
>>>>>>>>>
>>>>>>>>> Hi Fokko,
>>>>>>>>>
>>>>>>>>> sure, in general "absent" or "null" would be cleaner. But now we have 
>>>>>>>>> two representations for the same case - I suspect most went with the 
>>>>>>>>> "reference behavior".
>>>>>>>>>
>>>>>>>>> The thing is, that -1 cannot "go away".
>>>>>>>>>
>>>>>>>>> I'd prefer to keep the previous behavior - otherwise implementations 
>>>>>>>>> may fall back to 0, which is definitely wrong. Would be better IMHO 
>>>>>>>>> not to break existing implementations / render existing setups 
>>>>>>>>> incompatible with Iceberg 1.8.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 17.02.25 15:49, Fokko Driesprong wrote:
>>>>>>>>>
>>>>>>>>> Hey Robert,
>>>>>>>>>
>>>>>>>>> Thanks for raising this.
>>>>>>>>>
>>>>>>>>>> snapshot-ID -1 isn't per-se invalid, because the valid values are 
>>>>>>>>>> not defined in the spec.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> For me, this is invalid, since there is no snapshot with -1 in the 
>>>>>>>>> snapshots property. In the tests with the PR, you can see that there 
>>>>>>>>> are no snapshots. A year ago we had a similar discussion on PyIceberg 
>>>>>>>>> around this and this ended up in adding a flag to fall back to the 
>>>>>>>>> old behavior. I do agree that we should have communicated this more 
>>>>>>>>> clearly with the release.
>>>>>>>>>
>>>>>>>>> Kind regards,
>>>>>>>>> Fokko
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Op ma 17 feb 2025 om 12:25 schreef Robert Stupp <sn...@snazy.de>:
>>>>>>>>>>
>>>>>>>>>> Feels like https://github.com/apache/iceberg/pull/11560 introduced a 
>>>>>>>>>> behavior change.
>>>>>>>>>>
>>>>>>>>>> snapshot-ID -1 isn't per-se invalid, because the valid values are 
>>>>>>>>>> not defined in the spec.
>>>>>>>>>>
>>>>>>>>>> Previous Iceberg-Java versions always produced -1 if there's no 
>>>>>>>>>> current snapshot - 1.8 produces `null` in that case. So there are 
>>>>>>>>>> now two _different_ values for "no current snapshot".
>>>>>>>>>>
>>>>>>>>>> Implementations that rely on the behavior of the "reference 
>>>>>>>>>> implementation" (Iceberg-Java) do now fail in case there's no 
>>>>>>>>>> current snapshot.
>>>>>>>>>>
>>>>>>>>>> On 13.02.25 10:09, Amogh Jahagirdar wrote:
>>>>>>>>>>
>>>>>>>>>> I'm pleased to announce the release of Apache Iceberg 1.8.0!
>>>>>>>>>>
>>>>>>>>>> Apache Iceberg is an open table format for huge analytic datasets. 
>>>>>>>>>> Iceberg
>>>>>>>>>> delivers high query performance for tables with tens of petabytes of 
>>>>>>>>>> data,
>>>>>>>>>> along with atomic commits, concurrent writes, and SQL-compatible 
>>>>>>>>>> table
>>>>>>>>>> evolution.
>>>>>>>>>>
>>>>>>>>>> This release can be downloaded from: 
>>>>>>>>>> https://www.apache.org/dyn/closer.cgi/iceberg/apache-iceberg-1.8.0/apache-iceberg-1.8.0.tar.gz
>>>>>>>>>>
>>>>>>>>>> Release notes: https://iceberg.apache.org/releases/#180-release
>>>>>>>>>>
>>>>>>>>>> Java artifacts are available from Maven Central.
>>>>>>>>>>
>>>>>>>>>> Thanks to everyone for contributing!
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Robert Stupp
>>>>>>>>>> @snazy
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Robert Stupp
>>>>>>>>> @snazy
>>>>>>>
>>>>>>> --
>>>>>>> Robert Stupp
>>>>>>> @snazy
>>>>>>>
>>>>>>> --
>>>>>>> Robert Stupp
>>>>>>> @snazy

Reply via email to