Thanks Kevin, applied your suggestionm reads tighter this way.

And right on cue, this bit someone again on the Go side last week
https://github.com/apache/iceberg-go/pull/1176 - compacting a
Spark-written days-partitioned table blew up with "cannot use time.Time
with Avro type int". So the sooner this clarification lands, the sooner
we stop re-litigating it every few months. :)

One process note: since it touches format/spec.md, the contributor
guide treats it as a spec change that needs a formal vote even for a
clarification (no lazy-consensus modifier). You've almost approved,
and Russell too, so we're almost there, a couple more PMC +1s
on the vote thread and we can merge
https://github.com/apache/iceberg/pull/16446 and close
the loop. I'll start a [VOTE] thread now to make it official.

Best,
Andrei

On Thu, Jun 11, 2026 at 2:37 AM Kevin Liu <[email protected]> wrote:

> We never closed the loop on this :)
>
> I have one suggestion to keep the explanation format agnostic, please take
> a look!
> https://github.com/apache/iceberg/pull/16446#pullrequestreview-4472647904
> I'm also happy to merge the PR as is. The most important part is to change
> the result type from `int` -> `date`
>
> Best,
> Kevin Liu
>
> On Fri, May 22, 2026 at 9:00 PM Gang Wu <[email protected]> wrote:
>
>> FWIW, iceberg-cpp also produces a date type for the day transform so
>> we are happy with the consensus here.
>>
>> On Sat, May 23, 2026 at 12:14 AM Kevin Liu <[email protected]> wrote:
>> >
>> > Good to know about the Avro spec behavior, thanks Ryan.
>> >
>> > And thank you Andrei for driving the spec clarification. I'll comment
>> on the PR. I don't think we need a vote since this is a clarification and
>> not a change.
>> >
>> > On Thu, May 21, 2026 at 1:42 PM Andrei Tserakhau via dev <
>> [email protected]> wrote:
>> >>
>> >> Thanks Kevin, Fokko, and Ryan, looks like we've converged.
>> >>
>> >> Summary of where this lands:
>> >>
>> >>   - Result type for day becomes date, matching Java/PyIceberg/Rust's
>> >>   default behavior and the Avro types table in Appendix A.
>> >>   - Reader tolerance for historical plain-int manifests is inherited
>> >>   from the Avro spec itself (thanks Ryan for surfacing that saves
>> >>   us an Iceberg-side MUST clause).
>> >>   - A short note is added under the partition transforms table
>> >>   capturing the historical context, so this doesn't get re-litigated
>> >>   the next time someone reads the spec without the back-story.
>> >>
>> >> PR is updated accordingly:
>> https://github.com/apache/iceberg/pull/16446
>> >>
>> >> Fokko, Kevin, Ryan -- would appreciate a look when you have a moment.
>> >> Happy to iterate further on the note wording if anything reads off.
>> >>
>> >> For iceberg-go, I'll follow up with the writer + reader alignment
>> >> (PR #915 in iceberg-go is already in flight) once the spec change
>> >> lands.
>> >>
>> >> Best,
>> >> Andrei
>> >>
>> >> On Thu, May 21, 2026 at 9:41 PM Ryan Blue <[email protected]> wrote:
>> >>>
>> >>> Ugh, I think I sent from the wrong email address and my reply didn't
>> go through.
>> >>>
>> >>> Other people have covered the same things here, except for one point:
>> the Avro spec states that readers that don't support an annotation are
>> required to ignore it. So the behavior to read either date or int correctly
>> is inherited from the Avro spec.
>> >>>
>> >>> Ryan
>> >>>
>> >>> On Thu, May 21, 2026 at 10:17 AM Kevin Liu <[email protected]>
>> wrote:
>> >>>>
>> >>>> I wasn’t aware of the previous back-and-forth changes to this line
>> in the spec. Thanks for the extra context!
>> >>>>
>> >>>> A couple of points I want to align on:
>> >>>> 1. All implementations except Go, including Java, Python, and Rust,
>> write the day transform result as an Iceberg date type. That maps to the
>> Avro date type and is serialized as { "type": "int", "logicalType": "date"
>> }.
>> >>>> 2. The Go implementation writes the day transform result an Iceberg
>> int type. That maps to the Avro int type and is serialized as { "type":
>> "int" }.
>> >>>> 3. Java, Python, and Rust can read Avro manifest partition values as
>> either an Avro int type or an Avro date type.
>> >>>> 4. The Go implementation can currently read Avro manifest partition
>> values only as an Avro int type. This is the original issue that sparked
>> this conversation.
>> >>>>
>> >>>> Since the spec has gone back and forth between writing this as an
>> Iceberg int and an Iceberg date, I think readers must accept both. We can
>> include that as an implementation note.
>> >>>>
>> >>>> I support changing the spec back to date so it matches the default
>> behavior for day partition values in our implementations. Go is also making
>> the change to write date instead of int.
>> >>>> The other approach, updating all implementations to match the
>> current spec, would be a lot of work for little value.
>> >>>>
>> >>>> Hopefully this is the last time we make this change to the spec :)
>> >>>> Would love to hear from others.
>> >>>>
>> >>>> Best,
>> >>>> Kevin Liu
>> >>>>
>> >>>> On Wed, May 20, 2026 at 10:39 AM Fokko Driesprong <[email protected]>
>> wrote:
>> >>>>>
>> >>>>> > It wouldn't be the first time we've retroactively updated the
>> spec when finding inconsistencies with the current implementations :P
>> >>>>>
>> >>>>> I think generally we try to avoid this, but in this case it was
>> changed to few times :P Maybe we should revert the spec change:
>> >>>>>
>> >>>>>
>> https://github.com/apache/iceberg/pull/5980/changes#diff-36347a47c3bf67ea2ef6309ea96201814032d21bb5f162dfae4045508c15588a
>> >>>>>
>> >>>>> Curious to hear what other think.
>> >>>>>
>> >>>>> Kind regards,
>> >>>>> Fokko
>> >>>>>
>> >>>>>
>> >>>>> On 2026/05/20 17:24:22 Matt Topol wrote:
>> >>>>> > It wouldn't be the first time we've retroactively updated the spec
>> >>>>> > when finding inconsistencies with the current implementations :P
>> >>>>> >
>> >>>>> > Particularly, in this case even the "reference implementation"
>> (i.e.
>> >>>>> > Java) is technically not spec-compliant since the spec says that
>> it
>> >>>>> > should be an "int", not an Avro "date" type. If all the
>> >>>>> > implementations currently write a "date" type, then it's silly to
>> have
>> >>>>> > to say that every implementation is violating the spec.
>> >>>>> >
>> >>>>> > If we want the spec to say it should be an int, but tolerate
>> reading
>> >>>>> > an Avro "date" type, that's fine. But that would mean we should
>> update
>> >>>>> > Java, Rust, and PyIceberg to all write plain "int" and no longer
>> write
>> >>>>> > the "date" type, again: it would be silly to say that the
>> reference
>> >>>>> > implementation and 2 other implementations are not following the
>> spec.
>> >>>>> > :P
>> >>>>> >
>> >>>>> > I agree that it would be a big change for little value to update
>> the
>> >>>>> > implementations, so my opinion is that the spec should be updated
>> to
>> >>>>> > either say that "either" is allowed to be written, or that "date"
>> >>>>> > should be written but "int" should be allowed to be read.
>> >>>>> >
>> >>>>> > --Matt
>> >>>>> >
>> >>>>> > On Wed, May 20, 2026 at 1:05 PM Fokko Driesprong <
>> [email protected]> wrote:
>> >>>>> > >
>> >>>>> > > Thanks for the quick PR Andrei.
>> >>>>> > >
>> >>>>> > > The problem is that the note conflicts with the Avro/Iceberg
>> types table: https://iceberg.apache.org/spec/#avro
>> >>>>> > >
>> >>>>> > > I don't think we want to update the implementations as I agree
>> that it would be a big change for little value. At the same time, I don't
>> think we can retroactively update the spec. Maybe an implementation note
>> would be a better solution to halt the tradition?
>> >>>>> > >
>> >>>>> > > Kind regards,
>> >>>>> > > Fokko
>> >>>>> > >
>> >>>>> > >
>> >>>>> > > On 2026/05/20 16:49:29 Andrei Tserakhau via dev wrote:
>> >>>>> > > > Thanks Fokko, the historical context!
>> >>>>> > > >
>> >>>>> > > > Quick check that we're aligned, since I think we may be
>> closer than
>> >>>>> > > > it reads:
>> >>>>> > > >
>> >>>>> > > > My PR leaves the result type table as `int` -- no change to
>> the
>> >>>>> > > > transform table, no impact on hour/month/etc., no change to
>> the
>> >>>>> > > > type model.
>> >>>>> > > >
>> >>>>> > > > What the PR clarifies is the Avro encoding used when
>> serializing a
>> >>>>> > > > `day` partition field into a manifest. Empirically today,
>> Java,
>> >>>>> > > > PyIceberg, and Rust all write `{ "type": "int",
>> "logicalType": "date" }`
>> >>>>> > > > there (TypeToSchema in Java, DayTransform.result_type in
>> PyIceberg,
>> >>>>> > > > Transform::Day.result_type in Rust all produce a Date). Only
>> >>>>> > > > iceberg-go produces plain Avro `int`. The PR codifies the de
>> facto
>> >>>>> > > > writer behavior as SHOULD and makes reader tolerance MUST.
>> >>>>> > > >
>> >>>>> > > > If your "stick with int" also covers the Avro annotation,
>> then we'd
>> >>>>> > > > effectively be reverting three writers and orphaning every
>> existing
>> >>>>> > > > manifest, which I don't think decent path, it's quite a big
>> change
>> >>>>> > > > for small benefits.
>> >>>>> > > >
>> >>>>> > > > Either way, super happy to adjust the spec adjustment, the
>> goal is to
>> >>>>> > > > stop this tradition of re-litigating issue every year, by
>> misreading
>> >>>>> > > > this part of the spec.
>> >>>>> > > >
>> >>>>> > > > Best,
>> >>>>> > > > Andrei
>> >>>>> > > >
>> >>>>> > > > On Wed, May 20, 2026 at 6:37 PM Fokko Driesprong <
>> [email protected]> wrote:
>> >>>>> > > >
>> >>>>> > > > > Thanks for briging this up Kevin, a gift that keeps on
>> giving :)
>> >>>>> > > > >
>> https://github.com/apache/iceberg/issues/10616#issuecomment-2200191427
>> >>>>> > > > >
>> >>>>> > > > > 1. I think we should stick with the int type as defined in
>> the spec.
>> >>>>> > > > > 2. It feels to me that some readers are more permissive
>> here than others.
>> >>>>> > > > > I believe some allow reading date as an int without
>> throwing. Practically,
>> >>>>> > > > > readers should read both.
>> >>>>> > > > > 3. Unfortunally, I think this is water under the bridge. As
>> shown above in
>> >>>>> > > > > the GitHub Issue, we went back and forth, so I don't see a
>> lot of value in
>> >>>>> > > > > switching this to date. All OSS implementations handle this
>> as an int
>> >>>>> > > > > internally, and this also aligns with hour/month/etc.
>> >>>>> > > > >
>> >>>>> > > > > Hope this historical context helps.
>> >>>>> > > > >
>> >>>>> > > > > Kind regards,
>> >>>>> > > > > Fokko
>> >>>>> > > > >
>> >>>>> > > > >
>> >>>>> > > > > On 2026/05/20 16:33:51 Andrei Tserakhau via dev wrote:
>> >>>>> > > > > > Here is a fast follow with a PR:
>> >>>>> > > > > > https://github.com/apache/iceberg/pull/16446
>> >>>>> > > > > >
>> >>>>> > > > > > Best,
>> >>>>> > > > > > Andrei
>> >>>>> > > > > >
>> >>>>> > > > > > On Wed, May 20, 2026 at 6:11 PM Andrei Tserakhau <
>> >>>>> > > > > > [email protected]> wrote:
>> >>>>> > > > > >
>> >>>>> > > > > > > Thanks for raising this, Kevin.
>> >>>>> > > > > > >
>> >>>>> > > > > > > Speaking as an iceberg-go maintainer, even though Go is
>> the
>> >>>>> > > > > > > implementation that has to move, I'd vote:
>> >>>>> > > > > > >
>> >>>>> > > > > > > 1. Writers SHOULD emit { "type": "int", "logicalType":
>> "date" }.
>> >>>>> > > > > > > 2. Readers MUST accept both plain `int` and `int`
>> annotated with
>> >>>>> > > > > > >    `logicalType: date`.
>> >>>>> > > > > > > 3. Keep the transform result type table as-is (`int` as
>> the logical
>> >>>>> > > > > > >    Iceberg type). Don't change it to `date`. Add a
>> separate, normative
>> >>>>> > > > > > >    manifest-encoding clause so projection and
>> expression-evaluation
>> >>>>> > > > > > >    semantics that depend on the type model stay
>> untouched.
>> >>>>> > > > > > >
>> >>>>> > > > > > > Reasoning: when Java, PyIceberg, and Rust all write
>> logical `date`,
>> >>>>> > > > > > > that's the de facto wire format. Forcing them to switch
>> to plain `int`
>> >>>>> > > > > > > to match a literal reading of the transform table would
>> churn three
>> >>>>> > > > > > > implementations and leave every existing manifest
>> "non-conforming"
>> >>>>> > > > > > > forever. Aligning Go with the dominant writer
>> convention costs one
>> >>>>> > > > > > > implementation change (PR #915 already proposes it) and
>> zero historical
>> >>>>> > > > > > > churn.
>> >>>>> > > > > > >
>> >>>>> > > > > > > The underlying ambiguity is that "result type" (logical
>> Iceberg type)
>> >>>>> > > > > > > and "Avro manifest encoding" (wire format) were
>> conflated. Separating
>> >>>>> > > > > > > them in spec text removes the ambiguity without
>> changing the type
>> >>>>> > > > > > > system.
>> >>>>> > > > > > >
>> >>>>> > > > > > > Happy to drive the spec PR and then iceberg-go writer +
>> reader
>> >>>>> > > > > > > alignment.
>> >>>>> > > > > > >
>> >>>>> > > > > > > Best,
>> >>>>> > > > > > > Andrei
>> >>>>> > > > > > >
>> >>>>> > > > > > > On Tue, May 19, 2026 at 5:45 PM Kevin Liu <
>> [email protected]>
>> >>>>> > > > > wrote:
>> >>>>> > > > > > >
>> >>>>> > > > > > >> Hi all,
>> >>>>> > > > > > >>
>> >>>>> > > > > > >> I'd like to invite the community to discuss a spec
>> ambiguity in Apache
>> >>>>> > > > > > >> Iceberg that has caused some confusion across
>> implementations. We've
>> >>>>> > > > > seen
>> >>>>> > > > > > >> this come up in Python, Rust, and now Go.
>> >>>>> > > > > > >>
>> >>>>> > > > > > >> The issue: the spec documents the `day` partition
>> transform's result
>> >>>>> > > > > type
>> >>>>> > > > > > >> as plain `int`, but Java, PyIceberg, and Rust all
>> write manifest
>> >>>>> > > > > partition
>> >>>>> > > > > > >> fields using Avro's logical `date` type. Go currently
>> writes plain
>> >>>>> > > > > `int`,
>> >>>>> > > > > > >> which is the strict reading of the spec. Since both
>> forms have the
>> >>>>> > > > > same
>> >>>>> > > > > > >> physical representation, the difference is only the
>> Avro schema
>> >>>>> > > > > annotation
>> >>>>> > > > > > >> -- but it's worth clarifying the spec so all
>> implementations are
>> >>>>> > > > > aligned.
>> >>>>> > > > > > >>
>> >>>>> > > > > > >> The full analysis, including a breakdown of each
>> implementation's
>> >>>>> > > > > > >> writer/reader behavior and proposed resolution
>> options, is here:
>> >>>>> > > > > > >> https://github.com/apache/iceberg/issues/16414
>> >>>>> > > > > > >>
>> >>>>> > > > > > >> At a high level, the questions for the community are:
>> >>>>> > > > > > >> 1. What should implementations write: Avro `int`
>> (plain integer) or
>> >>>>> > > > > Avro
>> >>>>> > > > > > >> `date` (integer with a date logical type)?
>> >>>>> > > > > > >> 2. Should implementations be required to read both
>> forms, or just
>> >>>>> > > > > > >> encouraged to?
>> >>>>> > > > > > >> 3. Should the spec's transform result type table be
>> updated from
>> >>>>> > > > > `int` to
>> >>>>> > > > > > >> `date`?
>> >>>>> > > > > > >>
>> >>>>> > > > > > >> I'd love to hear your thoughts. Thanks!
>> >>>>> > > > > > >>
>> >>>>> > > > > > >> Best,
>> >>>>> > > > > > >> Kevin Liu
>> >>>>> > > > > > >>
>> >>>>> > > > > > >
>> >>>>> > > > > >
>> >>>>> > > > >
>> >>>>> > > >
>> >>>>> >
>>
>

Reply via email to