Thanks for confirming!

To close the loop on this issue, we have added more documentation about the
`result_type` function in PyIceberg. This clarifies the physical and
display representations of partition transforms. For DayTransform, the
physical representation is `int`, while the display representation is
`date`. This conforms to the spec and aligns with Spark's behavior. The
changes have been made in apache/iceberg-python#1211
<https://github.com/apache/iceberg-python/pull/1211>.

Thanks,
Kevin

On Mon, Oct 7, 2024 at 4:53 PM rdb...@gmail.com <rdb...@gmail.com> wrote:

> Yes. When we return the Spark type, it shows up as date and Spark
> correctly displays the value.
>
> On Mon, Sep 30, 2024 at 9:56 AM Kevin Liu <kevin.jq....@gmail.com> wrote:
>
>> Thank you both for the insights and context.
>>
>> As Russell pointed out, the "day partition transform" result is true of
>> int type. The Types.DateType
>> <https://github.com/apache/iceberg/blob/dddb5f423b353d961b8a08eb2cb4371d453c2959/api/src/main/java/org/apache/iceberg/transforms/Days.java#L47>
>>  corresponds
>> to TypeID.DATE
>> <https://github.com/apache/iceberg/blob/09370ddbc39fc3920fb8cbd3dff11b377dd37e40/api/src/main/java/org/apache/iceberg/types/Types.java#L181>,
>> which is also an Integer type
>> <https://github.com/apache/iceberg/blob/113c6e7d62e53d3e3cb15b1712f3a1db473ca940/api/src/main/java/org/apache/iceberg/types/Type.java#L37>.
>> So, this behavior conforms to the spec.
>>
>> The issue with DayTransform in PyIceberg (#1208
>> <https://github.com/apache/iceberg-python/pull/1208>) is due to the
>> changes in the PR. The problem arises from how the partition value is
>> displayed in the partition metadata table. As Ryan mentioned, Spark
>> displays the partition value as `date`. However, the PR removes
>> `DateType` as the `result_type`, which causes PyIceberg to display the
>> partition value as `int` since the epoch.
>>
>> > if we just change the type to `date`, engines could correctly display
>> the value
>>
>> I found a related discussion in apache/iceberg/#279
>> <https://github.com/apache/iceberg/issues/279#issuecomment-521322801>,
>> specifically: "That will cause the partition tuple's field type to be a
>> date, which should also cause the metadata table to display formatted dates
>> instead of the day ordinal in Spark." I want to confirm my understanding:
>> is this behavior due to the Iceberg-to-Spark DateType conversion in `
>> <https://github.com/apache/iceberg/blob/main/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java#L103-L104>
>> TypeToSparkType`
>> <https://github.com/apache/iceberg/blob/09370ddbc39fc3920fb8cbd3dff11b377dd37e40/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java#L103-L104>
>> ?
>>
>> Best,
>> Kevin
>>
>>
>>
>> On Fri, Sep 27, 2024 at 1:52 PM rdb...@gmail.com <rdb...@gmail.com>
>> wrote:
>>
>>> The background is that the result of the day function and dates are
>>> basically the same: the number of days from the Unix epoch. When we started
>>> using metadata tables, we realized that a lot of people use the day
>>> function but then get a weird ordinal value out, but if we just change the
>>> type to `date`, engines could correctly display the value. This isn't
>>> required by the spec, it's just a convenience.
>>>
>>> On Fri, Sep 27, 2024 at 8:30 AM Russell Spitzer <
>>> russell.spit...@gmail.com> wrote:
>>>
>>>> Good thing DateType is an Integer :)
>>>> https://github.com/apache/iceberg/blob/113c6e7d62e53d3e3cb15b1712f3a1db473ca940/api/src/main/java/org/apache/iceberg/types/Type.java#L37
>>>>
>>>> On Thu, Sep 26, 2024 at 8:38 PM Kevin Liu <kevin.jq....@gmail.com>
>>>> wrote:
>>>>
>>>>> Hey folks,
>>>>>
>>>>> While reviewing a PR to fix DayTransform in PyIceberg (#1208
>>>>> <https://github.com/apache/iceberg-python/pull/1208>), we found an
>>>>> inconsistency between the spec and the Java Iceberg library.
>>>>>
>>>>> According to the spec
>>>>> <https://iceberg.apache.org/spec/#partition-transforms>, the result
>>>>> type for the "day partition transform" should be `int`, similar to other
>>>>> time-based partition transforms (year/month/hour). However, in the Java
>>>>> Iceberg library, the result type for day partition transform is 
>>>>> `DateType` (
>>>>> source
>>>>> <https://github.com/apache/iceberg/blob/dddb5f423b353d961b8a08eb2cb4371d453c2959/api/src/main/java/org/apache/iceberg/transforms/Days.java#L47>).
>>>>> This seems to be a discrepancy from the spec, as the day partition
>>>>> transform is the only time-based transform with a non-int result
>>>>> type—whereas the others use IntegerType (source
>>>>> <https://grep.app/search?q=getResultType&filter[repo][0]=apache/iceberg&filter[path][0]=api/src/main/java/org/apache/iceberg/>
>>>>> ).
>>>>>
>>>>> Could someone confirm if my understanding is correct? If so, is there
>>>>> any historical context for this difference? Lastly, how should we approach
>>>>> resolving this moving forward?
>>>>>
>>>>> Best,
>>>>> Kevin
>>>>>
>>>>>

Reply via email to