It wouldn't be the first time we've retroactively updated the spec
when finding inconsistencies with the current implementations :P

Particularly, in this case even the "reference implementation" (i.e.
Java) is technically not spec-compliant since the spec says that it
should be an "int", not an Avro "date" type. If all the
implementations currently write a "date" type, then it's silly to have
to say that every implementation is violating the spec.

If we want the spec to say it should be an int, but tolerate reading
an Avro "date" type, that's fine. But that would mean we should update
Java, Rust, and PyIceberg to all write plain "int" and no longer write
the "date" type, again: it would be silly to say that the reference
implementation and 2 other implementations are not following the spec.
:P

I agree that it would be a big change for little value to update the
implementations, so my opinion is that the spec should be updated to
either say that "either" is allowed to be written, or that "date"
should be written but "int" should be allowed to be read.

--Matt

On Wed, May 20, 2026 at 1:05 PM Fokko Driesprong <[email protected]> wrote:
>
> Thanks for the quick PR Andrei.
>
> The problem is that the note conflicts with the Avro/Iceberg types table: 
> https://iceberg.apache.org/spec/#avro
>
> I don't think we want to update the implementations as I agree that it would 
> be a big change for little value. At the same time, I don't think we can 
> retroactively update the spec. Maybe an implementation note would be a better 
> solution to halt the tradition?
>
> Kind regards,
> Fokko
>
>
> On 2026/05/20 16:49:29 Andrei Tserakhau via dev wrote:
> > Thanks Fokko, the historical context!
> >
> > Quick check that we're aligned, since I think we may be closer than
> > it reads:
> >
> > My PR leaves the result type table as `int` -- no change to the
> > transform table, no impact on hour/month/etc., no change to the
> > type model.
> >
> > What the PR clarifies is the Avro encoding used when serializing a
> > `day` partition field into a manifest. Empirically today, Java,
> > PyIceberg, and Rust all write `{ "type": "int", "logicalType": "date" }`
> > there (TypeToSchema in Java, DayTransform.result_type in PyIceberg,
> > Transform::Day.result_type in Rust all produce a Date). Only
> > iceberg-go produces plain Avro `int`. The PR codifies the de facto
> > writer behavior as SHOULD and makes reader tolerance MUST.
> >
> > If your "stick with int" also covers the Avro annotation, then we'd
> > effectively be reverting three writers and orphaning every existing
> > manifest, which I don't think decent path, it's quite a big change
> > for small benefits.
> >
> > Either way, super happy to adjust the spec adjustment, the goal is to
> > stop this tradition of re-litigating issue every year, by misreading
> > this part of the spec.
> >
> > Best,
> > Andrei
> >
> > On Wed, May 20, 2026 at 6:37 PM Fokko Driesprong <[email protected]> wrote:
> >
> > > Thanks for briging this up Kevin, a gift that keeps on giving :)
> > > https://github.com/apache/iceberg/issues/10616#issuecomment-2200191427
> > >
> > > 1. I think we should stick with the int type as defined in the spec.
> > > 2. It feels to me that some readers are more permissive here than others.
> > > I believe some allow reading date as an int without throwing. Practically,
> > > readers should read both.
> > > 3. Unfortunally, I think this is water under the bridge. As shown above in
> > > the GitHub Issue, we went back and forth, so I don't see a lot of value in
> > > switching this to date. All OSS implementations handle this as an int
> > > internally, and this also aligns with hour/month/etc.
> > >
> > > Hope this historical context helps.
> > >
> > > Kind regards,
> > > Fokko
> > >
> > >
> > > On 2026/05/20 16:33:51 Andrei Tserakhau via dev wrote:
> > > > Here is a fast follow with a PR:
> > > > https://github.com/apache/iceberg/pull/16446
> > > >
> > > > Best,
> > > > Andrei
> > > >
> > > > On Wed, May 20, 2026 at 6:11 PM Andrei Tserakhau <
> > > > [email protected]> wrote:
> > > >
> > > > > Thanks for raising this, Kevin.
> > > > >
> > > > > Speaking as an iceberg-go maintainer, even though Go is the
> > > > > implementation that has to move, I'd vote:
> > > > >
> > > > > 1. Writers SHOULD emit { "type": "int", "logicalType": "date" }.
> > > > > 2. Readers MUST accept both plain `int` and `int` annotated with
> > > > >    `logicalType: date`.
> > > > > 3. Keep the transform result type table as-is (`int` as the logical
> > > > >    Iceberg type). Don't change it to `date`. Add a separate, normative
> > > > >    manifest-encoding clause so projection and expression-evaluation
> > > > >    semantics that depend on the type model stay untouched.
> > > > >
> > > > > Reasoning: when Java, PyIceberg, and Rust all write logical `date`,
> > > > > that's the de facto wire format. Forcing them to switch to plain `int`
> > > > > to match a literal reading of the transform table would churn three
> > > > > implementations and leave every existing manifest "non-conforming"
> > > > > forever. Aligning Go with the dominant writer convention costs one
> > > > > implementation change (PR #915 already proposes it) and zero 
> > > > > historical
> > > > > churn.
> > > > >
> > > > > The underlying ambiguity is that "result type" (logical Iceberg type)
> > > > > and "Avro manifest encoding" (wire format) were conflated. Separating
> > > > > them in spec text removes the ambiguity without changing the type
> > > > > system.
> > > > >
> > > > > Happy to drive the spec PR and then iceberg-go writer + reader
> > > > > alignment.
> > > > >
> > > > > Best,
> > > > > Andrei
> > > > >
> > > > > On Tue, May 19, 2026 at 5:45 PM Kevin Liu <[email protected]>
> > > wrote:
> > > > >
> > > > >> Hi all,
> > > > >>
> > > > >> I'd like to invite the community to discuss a spec ambiguity in 
> > > > >> Apache
> > > > >> Iceberg that has caused some confusion across implementations. We've
> > > seen
> > > > >> this come up in Python, Rust, and now Go.
> > > > >>
> > > > >> The issue: the spec documents the `day` partition transform's result
> > > type
> > > > >> as plain `int`, but Java, PyIceberg, and Rust all write manifest
> > > partition
> > > > >> fields using Avro's logical `date` type. Go currently writes plain
> > > `int`,
> > > > >> which is the strict reading of the spec. Since both forms have the
> > > same
> > > > >> physical representation, the difference is only the Avro schema
> > > annotation
> > > > >> -- but it's worth clarifying the spec so all implementations are
> > > aligned.
> > > > >>
> > > > >> The full analysis, including a breakdown of each implementation's
> > > > >> writer/reader behavior and proposed resolution options, is here:
> > > > >> https://github.com/apache/iceberg/issues/16414
> > > > >>
> > > > >> At a high level, the questions for the community are:
> > > > >> 1. What should implementations write: Avro `int` (plain integer) or
> > > Avro
> > > > >> `date` (integer with a date logical type)?
> > > > >> 2. Should implementations be required to read both forms, or just
> > > > >> encouraged to?
> > > > >> 3. Should the spec's transform result type table be updated from
> > > `int` to
> > > > >> `date`?
> > > > >>
> > > > >> I'd love to hear your thoughts. Thanks!
> > > > >>
> > > > >> Best,
> > > > >> Kevin Liu
> > > > >>
> > > > >
> > > >
> > >
> >

Reply via email to