Yes. When we return the Spark type, it shows up as date and Spark correctly displays the value.
On Mon, Sep 30, 2024 at 9:56 AM Kevin Liu <kevin.jq....@gmail.com> wrote: > Thank you both for the insights and context. > > As Russell pointed out, the "day partition transform" result is true of > int type. The Types.DateType > <https://github.com/apache/iceberg/blob/dddb5f423b353d961b8a08eb2cb4371d453c2959/api/src/main/java/org/apache/iceberg/transforms/Days.java#L47> > corresponds > to TypeID.DATE > <https://github.com/apache/iceberg/blob/09370ddbc39fc3920fb8cbd3dff11b377dd37e40/api/src/main/java/org/apache/iceberg/types/Types.java#L181>, > which is also an Integer type > <https://github.com/apache/iceberg/blob/113c6e7d62e53d3e3cb15b1712f3a1db473ca940/api/src/main/java/org/apache/iceberg/types/Type.java#L37>. > So, this behavior conforms to the spec. > > The issue with DayTransform in PyIceberg (#1208 > <https://github.com/apache/iceberg-python/pull/1208>) is due to the > changes in the PR. The problem arises from how the partition value is > displayed in the partition metadata table. As Ryan mentioned, Spark > displays the partition value as `date`. However, the PR removes > `DateType` as the `result_type`, which causes PyIceberg to display the > partition value as `int` since the epoch. > > > if we just change the type to `date`, engines could correctly display > the value > > I found a related discussion in apache/iceberg/#279 > <https://github.com/apache/iceberg/issues/279#issuecomment-521322801>, > specifically: "That will cause the partition tuple's field type to be a > date, which should also cause the metadata table to display formatted dates > instead of the day ordinal in Spark." I want to confirm my understanding: > is this behavior due to the Iceberg-to-Spark DateType conversion in ` > <https://github.com/apache/iceberg/blob/main/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java#L103-L104> > TypeToSparkType` > <https://github.com/apache/iceberg/blob/09370ddbc39fc3920fb8cbd3dff11b377dd37e40/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java#L103-L104> > ? > > Best, > Kevin > > > > On Fri, Sep 27, 2024 at 1:52 PM rdb...@gmail.com <rdb...@gmail.com> wrote: > >> The background is that the result of the day function and dates are >> basically the same: the number of days from the Unix epoch. When we started >> using metadata tables, we realized that a lot of people use the day >> function but then get a weird ordinal value out, but if we just change the >> type to `date`, engines could correctly display the value. This isn't >> required by the spec, it's just a convenience. >> >> On Fri, Sep 27, 2024 at 8:30 AM Russell Spitzer < >> russell.spit...@gmail.com> wrote: >> >>> Good thing DateType is an Integer :) >>> https://github.com/apache/iceberg/blob/113c6e7d62e53d3e3cb15b1712f3a1db473ca940/api/src/main/java/org/apache/iceberg/types/Type.java#L37 >>> >>> On Thu, Sep 26, 2024 at 8:38 PM Kevin Liu <kevin.jq....@gmail.com> >>> wrote: >>> >>>> Hey folks, >>>> >>>> While reviewing a PR to fix DayTransform in PyIceberg (#1208 >>>> <https://github.com/apache/iceberg-python/pull/1208>), we found an >>>> inconsistency between the spec and the Java Iceberg library. >>>> >>>> According to the spec >>>> <https://iceberg.apache.org/spec/#partition-transforms>, the result >>>> type for the "day partition transform" should be `int`, similar to other >>>> time-based partition transforms (year/month/hour). However, in the Java >>>> Iceberg library, the result type for day partition transform is `DateType` >>>> ( >>>> source >>>> <https://github.com/apache/iceberg/blob/dddb5f423b353d961b8a08eb2cb4371d453c2959/api/src/main/java/org/apache/iceberg/transforms/Days.java#L47>). >>>> This seems to be a discrepancy from the spec, as the day partition >>>> transform is the only time-based transform with a non-int result >>>> type—whereas the others use IntegerType (source >>>> <https://grep.app/search?q=getResultType&filter[repo][0]=apache/iceberg&filter[path][0]=api/src/main/java/org/apache/iceberg/> >>>> ). >>>> >>>> Could someone confirm if my understanding is correct? If so, is there >>>> any historical context for this difference? Lastly, how should we approach >>>> resolving this moving forward? >>>> >>>> Best, >>>> Kevin >>>> >>>>