Good to know about the Avro spec behavior, thanks Ryan. And thank you Andrei for driving the spec clarification. I'll comment on the PR. I don't think we need a vote since this is a clarification and not a change.
On Thu, May 21, 2026 at 1:42 PM Andrei Tserakhau via dev < [email protected]> wrote: > Thanks Kevin, Fokko, and Ryan, looks like we've converged. > > Summary of where this lands: > > - Result type for day becomes date, matching Java/PyIceberg/Rust's > default behavior and the Avro types table in Appendix A. > - Reader tolerance for historical plain-int manifests is inherited > from the Avro spec itself (thanks Ryan for surfacing that saves > us an Iceberg-side MUST clause). > - A short note is added under the partition transforms table > capturing the historical context, so this doesn't get re-litigated > the next time someone reads the spec without the back-story. > > PR is updated accordingly: https://github.com/apache/iceberg/pull/16446 > > Fokko, Kevin, Ryan -- would appreciate a look when you have a moment. > Happy to iterate further on the note wording if anything reads off. > > For iceberg-go, I'll follow up with the writer + reader alignment > (PR #915 in iceberg-go is already in flight) once the spec change > lands. > > Best, > Andrei > > On Thu, May 21, 2026 at 9:41 PM Ryan Blue <[email protected]> wrote: > >> Ugh, I think I sent from the wrong email address and my reply didn't go >> through. >> >> Other people have covered the same things here, except for one point: the >> Avro spec states that readers that don't support an annotation are >> required to ignore it >> <https://avro.apache.org/docs/1.11.1/specification/#logical-types>. So >> the behavior to read either date or int correctly is inherited >> from the Avro spec. >> >> Ryan >> >> On Thu, May 21, 2026 at 10:17 AM Kevin Liu <[email protected]> wrote: >> >>> I wasn’t aware of the previous back-and-forth changes to this line in >>> the spec. Thanks for the extra context! >>> >>> A couple of points I want to align on: >>> 1. All implementations except Go, including Java, Python, and Rust, >>> write the day transform result as an Iceberg date type. That maps to the >>> Avro date type and is serialized as { "type": "int", "logicalType": "date" >>> }. >>> 2. The Go implementation writes the day transform result an Iceberg int >>> type. That maps to the Avro int type and is serialized as { "type": "int" }. >>> 3. Java, Python, and Rust can read Avro manifest partition values as >>> either an Avro int type or an Avro date type. >>> 4. The Go implementation can currently read Avro manifest partition >>> values only as an Avro int type. This is the original issue that sparked >>> this conversation. >>> >>> Since the spec has gone back and forth between writing this as an >>> Iceberg int and an Iceberg date, I think readers must accept both. We can >>> include that as an implementation note. >>> >>> I support changing the spec back to date so it matches the default >>> behavior for day partition values in our implementations. Go is also >>> making the change to write date instead of int. >>> The other approach, updating all implementations to match the current >>> spec, would be a lot of work for little value. >>> >>> Hopefully this is the last time we make this change to the spec :) >>> Would love to hear from others. >>> >>> Best, >>> Kevin Liu >>> >>> On Wed, May 20, 2026 at 10:39 AM Fokko Driesprong <[email protected]> >>> wrote: >>> >>>> > It wouldn't be the first time we've retroactively updated the spec >>>> when finding inconsistencies with the current implementations :P >>>> >>>> I think generally we try to avoid this, but in this case it was changed >>>> to few times :P Maybe we should revert the spec change: >>>> >>>> >>>> https://github.com/apache/iceberg/pull/5980/changes#diff-36347a47c3bf67ea2ef6309ea96201814032d21bb5f162dfae4045508c15588a >>>> >>>> Curious to hear what other think. >>>> >>>> Kind regards, >>>> Fokko >>>> >>>> >>>> On 2026/05/20 17:24:22 Matt Topol wrote: >>>> > It wouldn't be the first time we've retroactively updated the spec >>>> > when finding inconsistencies with the current implementations :P >>>> > >>>> > Particularly, in this case even the "reference implementation" (i.e. >>>> > Java) is technically not spec-compliant since the spec says that it >>>> > should be an "int", not an Avro "date" type. If all the >>>> > implementations currently write a "date" type, then it's silly to have >>>> > to say that every implementation is violating the spec. >>>> > >>>> > If we want the spec to say it should be an int, but tolerate reading >>>> > an Avro "date" type, that's fine. But that would mean we should update >>>> > Java, Rust, and PyIceberg to all write plain "int" and no longer write >>>> > the "date" type, again: it would be silly to say that the reference >>>> > implementation and 2 other implementations are not following the spec. >>>> > :P >>>> > >>>> > I agree that it would be a big change for little value to update the >>>> > implementations, so my opinion is that the spec should be updated to >>>> > either say that "either" is allowed to be written, or that "date" >>>> > should be written but "int" should be allowed to be read. >>>> > >>>> > --Matt >>>> > >>>> > On Wed, May 20, 2026 at 1:05 PM Fokko Driesprong <[email protected]> >>>> wrote: >>>> > > >>>> > > Thanks for the quick PR Andrei. >>>> > > >>>> > > The problem is that the note conflicts with the Avro/Iceberg types >>>> table: https://iceberg.apache.org/spec/#avro >>>> > > >>>> > > I don't think we want to update the implementations as I agree that >>>> it would be a big change for little value. At the same time, I don't think >>>> we can retroactively update the spec. Maybe an implementation note would be >>>> a better solution to halt the tradition? >>>> > > >>>> > > Kind regards, >>>> > > Fokko >>>> > > >>>> > > >>>> > > On 2026/05/20 16:49:29 Andrei Tserakhau via dev wrote: >>>> > > > Thanks Fokko, the historical context! >>>> > > > >>>> > > > Quick check that we're aligned, since I think we may be closer >>>> than >>>> > > > it reads: >>>> > > > >>>> > > > My PR leaves the result type table as `int` -- no change to the >>>> > > > transform table, no impact on hour/month/etc., no change to the >>>> > > > type model. >>>> > > > >>>> > > > What the PR clarifies is the Avro encoding used when serializing a >>>> > > > `day` partition field into a manifest. Empirically today, Java, >>>> > > > PyIceberg, and Rust all write `{ "type": "int", "logicalType": >>>> "date" }` >>>> > > > there (TypeToSchema in Java, DayTransform.result_type in >>>> PyIceberg, >>>> > > > Transform::Day.result_type in Rust all produce a Date). Only >>>> > > > iceberg-go produces plain Avro `int`. The PR codifies the de facto >>>> > > > writer behavior as SHOULD and makes reader tolerance MUST. >>>> > > > >>>> > > > If your "stick with int" also covers the Avro annotation, then >>>> we'd >>>> > > > effectively be reverting three writers and orphaning every >>>> existing >>>> > > > manifest, which I don't think decent path, it's quite a big change >>>> > > > for small benefits. >>>> > > > >>>> > > > Either way, super happy to adjust the spec adjustment, the goal >>>> is to >>>> > > > stop this tradition of re-litigating issue every year, by >>>> misreading >>>> > > > this part of the spec. >>>> > > > >>>> > > > Best, >>>> > > > Andrei >>>> > > > >>>> > > > On Wed, May 20, 2026 at 6:37 PM Fokko Driesprong < >>>> [email protected]> wrote: >>>> > > > >>>> > > > > Thanks for briging this up Kevin, a gift that keeps on giving :) >>>> > > > > >>>> https://github.com/apache/iceberg/issues/10616#issuecomment-2200191427 >>>> > > > > >>>> > > > > 1. I think we should stick with the int type as defined in the >>>> spec. >>>> > > > > 2. It feels to me that some readers are more permissive here >>>> than others. >>>> > > > > I believe some allow reading date as an int without throwing. >>>> Practically, >>>> > > > > readers should read both. >>>> > > > > 3. Unfortunally, I think this is water under the bridge. As >>>> shown above in >>>> > > > > the GitHub Issue, we went back and forth, so I don't see a lot >>>> of value in >>>> > > > > switching this to date. All OSS implementations handle this as >>>> an int >>>> > > > > internally, and this also aligns with hour/month/etc. >>>> > > > > >>>> > > > > Hope this historical context helps. >>>> > > > > >>>> > > > > Kind regards, >>>> > > > > Fokko >>>> > > > > >>>> > > > > >>>> > > > > On 2026/05/20 16:33:51 Andrei Tserakhau via dev wrote: >>>> > > > > > Here is a fast follow with a PR: >>>> > > > > > https://github.com/apache/iceberg/pull/16446 >>>> > > > > > >>>> > > > > > Best, >>>> > > > > > Andrei >>>> > > > > > >>>> > > > > > On Wed, May 20, 2026 at 6:11 PM Andrei Tserakhau < >>>> > > > > > [email protected]> wrote: >>>> > > > > > >>>> > > > > > > Thanks for raising this, Kevin. >>>> > > > > > > >>>> > > > > > > Speaking as an iceberg-go maintainer, even though Go is the >>>> > > > > > > implementation that has to move, I'd vote: >>>> > > > > > > >>>> > > > > > > 1. Writers SHOULD emit { "type": "int", "logicalType": >>>> "date" }. >>>> > > > > > > 2. Readers MUST accept both plain `int` and `int` annotated >>>> with >>>> > > > > > > `logicalType: date`. >>>> > > > > > > 3. Keep the transform result type table as-is (`int` as the >>>> logical >>>> > > > > > > Iceberg type). Don't change it to `date`. Add a >>>> separate, normative >>>> > > > > > > manifest-encoding clause so projection and >>>> expression-evaluation >>>> > > > > > > semantics that depend on the type model stay untouched. >>>> > > > > > > >>>> > > > > > > Reasoning: when Java, PyIceberg, and Rust all write logical >>>> `date`, >>>> > > > > > > that's the de facto wire format. Forcing them to switch to >>>> plain `int` >>>> > > > > > > to match a literal reading of the transform table would >>>> churn three >>>> > > > > > > implementations and leave every existing manifest >>>> "non-conforming" >>>> > > > > > > forever. Aligning Go with the dominant writer convention >>>> costs one >>>> > > > > > > implementation change (PR #915 already proposes it) and >>>> zero historical >>>> > > > > > > churn. >>>> > > > > > > >>>> > > > > > > The underlying ambiguity is that "result type" (logical >>>> Iceberg type) >>>> > > > > > > and "Avro manifest encoding" (wire format) were conflated. >>>> Separating >>>> > > > > > > them in spec text removes the ambiguity without changing >>>> the type >>>> > > > > > > system. >>>> > > > > > > >>>> > > > > > > Happy to drive the spec PR and then iceberg-go writer + >>>> reader >>>> > > > > > > alignment. >>>> > > > > > > >>>> > > > > > > Best, >>>> > > > > > > Andrei >>>> > > > > > > >>>> > > > > > > On Tue, May 19, 2026 at 5:45 PM Kevin Liu < >>>> [email protected]> >>>> > > > > wrote: >>>> > > > > > > >>>> > > > > > >> Hi all, >>>> > > > > > >> >>>> > > > > > >> I'd like to invite the community to discuss a spec >>>> ambiguity in Apache >>>> > > > > > >> Iceberg that has caused some confusion across >>>> implementations. We've >>>> > > > > seen >>>> > > > > > >> this come up in Python, Rust, and now Go. >>>> > > > > > >> >>>> > > > > > >> The issue: the spec documents the `day` partition >>>> transform's result >>>> > > > > type >>>> > > > > > >> as plain `int`, but Java, PyIceberg, and Rust all write >>>> manifest >>>> > > > > partition >>>> > > > > > >> fields using Avro's logical `date` type. Go currently >>>> writes plain >>>> > > > > `int`, >>>> > > > > > >> which is the strict reading of the spec. Since both forms >>>> have the >>>> > > > > same >>>> > > > > > >> physical representation, the difference is only the Avro >>>> schema >>>> > > > > annotation >>>> > > > > > >> -- but it's worth clarifying the spec so all >>>> implementations are >>>> > > > > aligned. >>>> > > > > > >> >>>> > > > > > >> The full analysis, including a breakdown of each >>>> implementation's >>>> > > > > > >> writer/reader behavior and proposed resolution options, is >>>> here: >>>> > > > > > >> https://github.com/apache/iceberg/issues/16414 >>>> > > > > > >> >>>> > > > > > >> At a high level, the questions for the community are: >>>> > > > > > >> 1. What should implementations write: Avro `int` (plain >>>> integer) or >>>> > > > > Avro >>>> > > > > > >> `date` (integer with a date logical type)? >>>> > > > > > >> 2. Should implementations be required to read both forms, >>>> or just >>>> > > > > > >> encouraged to? >>>> > > > > > >> 3. Should the spec's transform result type table be >>>> updated from >>>> > > > > `int` to >>>> > > > > > >> `date`? >>>> > > > > > >> >>>> > > > > > >> I'd love to hear your thoughts. Thanks! >>>> > > > > > >> >>>> > > > > > >> Best, >>>> > > > > > >> Kevin Liu >>>> > > > > > >> >>>> > > > > > > >>>> > > > > > >>>> > > > > >>>> > > > >>>> > >>>> >>>
