Yes. When we return the Spark type, it shows up as date and Spark correctly
displays the value.

On Mon, Sep 30, 2024 at 9:56 AM Kevin Liu <kevin.jq....@gmail.com> wrote:

> Thank you both for the insights and context.
>
> As Russell pointed out, the "day partition transform" result is true of
> int type. The Types.DateType
> <https://github.com/apache/iceberg/blob/dddb5f423b353d961b8a08eb2cb4371d453c2959/api/src/main/java/org/apache/iceberg/transforms/Days.java#L47>
>  corresponds
> to TypeID.DATE
> <https://github.com/apache/iceberg/blob/09370ddbc39fc3920fb8cbd3dff11b377dd37e40/api/src/main/java/org/apache/iceberg/types/Types.java#L181>,
> which is also an Integer type
> <https://github.com/apache/iceberg/blob/113c6e7d62e53d3e3cb15b1712f3a1db473ca940/api/src/main/java/org/apache/iceberg/types/Type.java#L37>.
> So, this behavior conforms to the spec.
>
> The issue with DayTransform in PyIceberg (#1208
> <https://github.com/apache/iceberg-python/pull/1208>) is due to the
> changes in the PR. The problem arises from how the partition value is
> displayed in the partition metadata table. As Ryan mentioned, Spark
> displays the partition value as `date`. However, the PR removes
> `DateType` as the `result_type`, which causes PyIceberg to display the
> partition value as `int` since the epoch.
>
> > if we just change the type to `date`, engines could correctly display
> the value
>
> I found a related discussion in apache/iceberg/#279
> <https://github.com/apache/iceberg/issues/279#issuecomment-521322801>,
> specifically: "That will cause the partition tuple's field type to be a
> date, which should also cause the metadata table to display formatted dates
> instead of the day ordinal in Spark." I want to confirm my understanding:
> is this behavior due to the Iceberg-to-Spark DateType conversion in `
> <https://github.com/apache/iceberg/blob/main/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java#L103-L104>
> TypeToSparkType`
> <https://github.com/apache/iceberg/blob/09370ddbc39fc3920fb8cbd3dff11b377dd37e40/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java#L103-L104>
> ?
>
> Best,
> Kevin
>
>
>
> On Fri, Sep 27, 2024 at 1:52 PM rdb...@gmail.com <rdb...@gmail.com> wrote:
>
>> The background is that the result of the day function and dates are
>> basically the same: the number of days from the Unix epoch. When we started
>> using metadata tables, we realized that a lot of people use the day
>> function but then get a weird ordinal value out, but if we just change the
>> type to `date`, engines could correctly display the value. This isn't
>> required by the spec, it's just a convenience.
>>
>> On Fri, Sep 27, 2024 at 8:30 AM Russell Spitzer <
>> russell.spit...@gmail.com> wrote:
>>
>>> Good thing DateType is an Integer :)
>>> https://github.com/apache/iceberg/blob/113c6e7d62e53d3e3cb15b1712f3a1db473ca940/api/src/main/java/org/apache/iceberg/types/Type.java#L37
>>>
>>> On Thu, Sep 26, 2024 at 8:38 PM Kevin Liu <kevin.jq....@gmail.com>
>>> wrote:
>>>
>>>> Hey folks,
>>>>
>>>> While reviewing a PR to fix DayTransform in PyIceberg (#1208
>>>> <https://github.com/apache/iceberg-python/pull/1208>), we found an
>>>> inconsistency between the spec and the Java Iceberg library.
>>>>
>>>> According to the spec
>>>> <https://iceberg.apache.org/spec/#partition-transforms>, the result
>>>> type for the "day partition transform" should be `int`, similar to other
>>>> time-based partition transforms (year/month/hour). However, in the Java
>>>> Iceberg library, the result type for day partition transform is `DateType` 
>>>> (
>>>> source
>>>> <https://github.com/apache/iceberg/blob/dddb5f423b353d961b8a08eb2cb4371d453c2959/api/src/main/java/org/apache/iceberg/transforms/Days.java#L47>).
>>>> This seems to be a discrepancy from the spec, as the day partition
>>>> transform is the only time-based transform with a non-int result
>>>> type—whereas the others use IntegerType (source
>>>> <https://grep.app/search?q=getResultType&filter[repo][0]=apache/iceberg&filter[path][0]=api/src/main/java/org/apache/iceberg/>
>>>> ).
>>>>
>>>> Could someone confirm if my understanding is correct? If so, is there
>>>> any historical context for this difference? Lastly, how should we approach
>>>> resolving this moving forward?
>>>>
>>>> Best,
>>>> Kevin
>>>>
>>>>

Reply via email to