Thank you both for the insights and context. As Russell pointed out, the "day partition transform" result is true of int type. The Types.DateType <https://github.com/apache/iceberg/blob/dddb5f423b353d961b8a08eb2cb4371d453c2959/api/src/main/java/org/apache/iceberg/transforms/Days.java#L47> corresponds to TypeID.DATE <https://github.com/apache/iceberg/blob/09370ddbc39fc3920fb8cbd3dff11b377dd37e40/api/src/main/java/org/apache/iceberg/types/Types.java#L181>, which is also an Integer type <https://github.com/apache/iceberg/blob/113c6e7d62e53d3e3cb15b1712f3a1db473ca940/api/src/main/java/org/apache/iceberg/types/Type.java#L37>. So, this behavior conforms to the spec.
The issue with DayTransform in PyIceberg (#1208 <https://github.com/apache/iceberg-python/pull/1208>) is due to the changes in the PR. The problem arises from how the partition value is displayed in the partition metadata table. As Ryan mentioned, Spark displays the partition value as `date`. However, the PR removes `DateType` as the `result_type`, which causes PyIceberg to display the partition value as `int` since the epoch. > if we just change the type to `date`, engines could correctly display the value I found a related discussion in apache/iceberg/#279 <https://github.com/apache/iceberg/issues/279#issuecomment-521322801>, specifically: "That will cause the partition tuple's field type to be a date, which should also cause the metadata table to display formatted dates instead of the day ordinal in Spark." I want to confirm my understanding: is this behavior due to the Iceberg-to-Spark DateType conversion in ` <https://github.com/apache/iceberg/blob/main/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java#L103-L104> TypeToSparkType` <https://github.com/apache/iceberg/blob/09370ddbc39fc3920fb8cbd3dff11b377dd37e40/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java#L103-L104> ? Best, Kevin On Fri, Sep 27, 2024 at 1:52 PM rdb...@gmail.com <rdb...@gmail.com> wrote: > The background is that the result of the day function and dates are > basically the same: the number of days from the Unix epoch. When we started > using metadata tables, we realized that a lot of people use the day > function but then get a weird ordinal value out, but if we just change the > type to `date`, engines could correctly display the value. This isn't > required by the spec, it's just a convenience. > > On Fri, Sep 27, 2024 at 8:30 AM Russell Spitzer <russell.spit...@gmail.com> > wrote: > >> Good thing DateType is an Integer :) >> https://github.com/apache/iceberg/blob/113c6e7d62e53d3e3cb15b1712f3a1db473ca940/api/src/main/java/org/apache/iceberg/types/Type.java#L37 >> >> On Thu, Sep 26, 2024 at 8:38 PM Kevin Liu <kevin.jq....@gmail.com> wrote: >> >>> Hey folks, >>> >>> While reviewing a PR to fix DayTransform in PyIceberg (#1208 >>> <https://github.com/apache/iceberg-python/pull/1208>), we found an >>> inconsistency between the spec and the Java Iceberg library. >>> >>> According to the spec >>> <https://iceberg.apache.org/spec/#partition-transforms>, the result >>> type for the "day partition transform" should be `int`, similar to other >>> time-based partition transforms (year/month/hour). However, in the Java >>> Iceberg library, the result type for day partition transform is `DateType` ( >>> source >>> <https://github.com/apache/iceberg/blob/dddb5f423b353d961b8a08eb2cb4371d453c2959/api/src/main/java/org/apache/iceberg/transforms/Days.java#L47>). >>> This seems to be a discrepancy from the spec, as the day partition >>> transform is the only time-based transform with a non-int result >>> type—whereas the others use IntegerType (source >>> <https://grep.app/search?q=getResultType&filter[repo][0]=apache/iceberg&filter[path][0]=api/src/main/java/org/apache/iceberg/> >>> ). >>> >>> Could someone confirm if my understanding is correct? If so, is there >>> any historical context for this difference? Lastly, how should we approach >>> resolving this moving forward? >>> >>> Best, >>> Kevin >>> >>>