Thanks for confirming! To close the loop on this issue, we have added more documentation about the `result_type` function in PyIceberg. This clarifies the physical and display representations of partition transforms. For DayTransform, the physical representation is `int`, while the display representation is `date`. This conforms to the spec and aligns with Spark's behavior. The changes have been made in apache/iceberg-python#1211 <https://github.com/apache/iceberg-python/pull/1211>.
Thanks, Kevin On Mon, Oct 7, 2024 at 4:53 PM rdb...@gmail.com <rdb...@gmail.com> wrote: > Yes. When we return the Spark type, it shows up as date and Spark > correctly displays the value. > > On Mon, Sep 30, 2024 at 9:56 AM Kevin Liu <kevin.jq....@gmail.com> wrote: > >> Thank you both for the insights and context. >> >> As Russell pointed out, the "day partition transform" result is true of >> int type. The Types.DateType >> <https://github.com/apache/iceberg/blob/dddb5f423b353d961b8a08eb2cb4371d453c2959/api/src/main/java/org/apache/iceberg/transforms/Days.java#L47> >> corresponds >> to TypeID.DATE >> <https://github.com/apache/iceberg/blob/09370ddbc39fc3920fb8cbd3dff11b377dd37e40/api/src/main/java/org/apache/iceberg/types/Types.java#L181>, >> which is also an Integer type >> <https://github.com/apache/iceberg/blob/113c6e7d62e53d3e3cb15b1712f3a1db473ca940/api/src/main/java/org/apache/iceberg/types/Type.java#L37>. >> So, this behavior conforms to the spec. >> >> The issue with DayTransform in PyIceberg (#1208 >> <https://github.com/apache/iceberg-python/pull/1208>) is due to the >> changes in the PR. The problem arises from how the partition value is >> displayed in the partition metadata table. As Ryan mentioned, Spark >> displays the partition value as `date`. However, the PR removes >> `DateType` as the `result_type`, which causes PyIceberg to display the >> partition value as `int` since the epoch. >> >> > if we just change the type to `date`, engines could correctly display >> the value >> >> I found a related discussion in apache/iceberg/#279 >> <https://github.com/apache/iceberg/issues/279#issuecomment-521322801>, >> specifically: "That will cause the partition tuple's field type to be a >> date, which should also cause the metadata table to display formatted dates >> instead of the day ordinal in Spark." I want to confirm my understanding: >> is this behavior due to the Iceberg-to-Spark DateType conversion in ` >> <https://github.com/apache/iceberg/blob/main/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java#L103-L104> >> TypeToSparkType` >> <https://github.com/apache/iceberg/blob/09370ddbc39fc3920fb8cbd3dff11b377dd37e40/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java#L103-L104> >> ? >> >> Best, >> Kevin >> >> >> >> On Fri, Sep 27, 2024 at 1:52 PM rdb...@gmail.com <rdb...@gmail.com> >> wrote: >> >>> The background is that the result of the day function and dates are >>> basically the same: the number of days from the Unix epoch. When we started >>> using metadata tables, we realized that a lot of people use the day >>> function but then get a weird ordinal value out, but if we just change the >>> type to `date`, engines could correctly display the value. This isn't >>> required by the spec, it's just a convenience. >>> >>> On Fri, Sep 27, 2024 at 8:30 AM Russell Spitzer < >>> russell.spit...@gmail.com> wrote: >>> >>>> Good thing DateType is an Integer :) >>>> https://github.com/apache/iceberg/blob/113c6e7d62e53d3e3cb15b1712f3a1db473ca940/api/src/main/java/org/apache/iceberg/types/Type.java#L37 >>>> >>>> On Thu, Sep 26, 2024 at 8:38 PM Kevin Liu <kevin.jq....@gmail.com> >>>> wrote: >>>> >>>>> Hey folks, >>>>> >>>>> While reviewing a PR to fix DayTransform in PyIceberg (#1208 >>>>> <https://github.com/apache/iceberg-python/pull/1208>), we found an >>>>> inconsistency between the spec and the Java Iceberg library. >>>>> >>>>> According to the spec >>>>> <https://iceberg.apache.org/spec/#partition-transforms>, the result >>>>> type for the "day partition transform" should be `int`, similar to other >>>>> time-based partition transforms (year/month/hour). However, in the Java >>>>> Iceberg library, the result type for day partition transform is >>>>> `DateType` ( >>>>> source >>>>> <https://github.com/apache/iceberg/blob/dddb5f423b353d961b8a08eb2cb4371d453c2959/api/src/main/java/org/apache/iceberg/transforms/Days.java#L47>). >>>>> This seems to be a discrepancy from the spec, as the day partition >>>>> transform is the only time-based transform with a non-int result >>>>> type—whereas the others use IntegerType (source >>>>> <https://grep.app/search?q=getResultType&filter[repo][0]=apache/iceberg&filter[path][0]=api/src/main/java/org/apache/iceberg/> >>>>> ). >>>>> >>>>> Could someone confirm if my understanding is correct? If so, is there >>>>> any historical context for this difference? Lastly, how should we approach >>>>> resolving this moving forward? >>>>> >>>>> Best, >>>>> Kevin >>>>> >>>>>