Bumping this thread again.

I received this SQL from a user today.

CREATE TABLE partition_by_epoch_timestamp (
>   `id` STRING,
>   `data` DECIMAL(10,0),
>   `start_ts` DECIMAL(10,0),
>   `end_ts` DECIMAL(10,0)
>   )
> USING iceberg
> PARTITIONED BY (
>     days(from_unixtime(start_ts))
> )


Of course it fails now. Do you think it's possible to support? Or something
similar?

Thanks,
Manu

On Fri, Sep 13, 2024 at 3:51 AM rdb...@gmail.com <rdb...@gmail.com> wrote:

> I was thinking about adding to the specification how the current date/time
> transforms work with source columns that are longs. I don't think it helps
> much to introduce a new transform, although that would make the
> interpretation of the source data more obvious to users applying the
> transform.
>
> On Wed, Sep 11, 2024 at 10:26 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
>> Maybe we could update the time-based partition functions to be applied
>>> to a long column directly. It would treat that column like a timestamp in
>>> milliseconds. Would that work? I need to think more about the implications
>>> of doing that, but I don't think that we currently have an issue with
>>> extending the supported source column types like that.
>>
>>
>> By update do you mean adding new functions like
>> HOUR_FROM_UNIX_EPOCH_MILLIS or overloading the definition to accept Long
>> values and treat them as milliseconds. The former seems more reasonable to
>> me. The latter I think has many of the same draw-backs raised on the other
>> thread.  IMO, both aren't super pleasing from a long term maintainability
>> perspective.
>>
>> Cheers,
>> Micah
>>
>> On Tue, Sep 10, 2024 at 8:50 AM rdb...@gmail.com <rdb...@gmail.com>
>> wrote:
>>
>>> Maybe we could update the time-based partition functions to be applied
>>> to a long column directly. It would treat that column like a timestamp in
>>> milliseconds. Would that work? I need to think more about the implications
>>> of doing that, but I don't think that we currently have an issue with
>>> extending the supported source column types like that.
>>>
>>> On Mon, Sep 9, 2024 at 9:05 PM Manu Zhang <owenzhang1...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'd like to bump this thread since we don't want to allow long to
>>>> timestamp promotion in V3
>>>> <https://lists.apache.org/thread/79y866zdhs2fmyv0nsfq3xvdsmqh7h8c>.
>>>> What other options do we have?
>>>>
>>>> Thanks,
>>>> Manu
>>>>
>>>> On Fri, Apr 5, 2024 at 12:09 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>>>> wrote:
>>>>
>>>>> Ah yes, milestone is fine. Thanks !
>>>>>
>>>>> All good.
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>> On Thu, Apr 4, 2024 at 5:12 PM Eduard Tudenhoefner <edu...@tabular.io>
>>>>> wrote:
>>>>> >
>>>>> > There is the V3 Spec milestone where it's tracked (amongst other
>>>>> things).
>>>>> >
>>>>> > On Thu, Apr 4, 2024 at 9:44 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>>>>> wrote:
>>>>> >>
>>>>> >> Hi Eduard,
>>>>> >>
>>>>> >> Thanks for the update ! It makes sense to me.
>>>>> >>
>>>>> >> Maybe a GH label with spec or v3_spec would help to see what is
>>>>> planned for v3 ?
>>>>> >>
>>>>> >> Regards
>>>>> >> JB
>>>>> >>
>>>>> >> On Thu, Apr 4, 2024 at 9:36 AM Eduard Tudenhoefner <
>>>>> edu...@tabular.io> wrote:
>>>>> >> >
>>>>> >> > Type promotion from Long to Timestamp is on the roadmap for the
>>>>> V3 Spec, so that would be the preferred way.
>>>>> >> >
>>>>> >> > On Wed, Apr 3, 2024 at 10:38 AM Jean-Baptiste Onofré <
>>>>> j...@nanthrax.net> wrote:
>>>>> >> >>
>>>>> >> >> Hi Manu
>>>>> >> >>
>>>>> >> >> TIMESTAMP_LONG type promotion could be the easiest way, it would
>>>>> work
>>>>> >> >> with the existing transform.
>>>>> >> >>
>>>>> >> >> Would it work for you?
>>>>> >> >>
>>>>> >> >> Regards
>>>>> >> >> JB
>>>>> >> >>
>>>>> >> >> On Wed, Apr 3, 2024 at 5:56 AM Manu Zhang <
>>>>> owenzhang1...@gmail.com> wrote:
>>>>> >> >> >
>>>>> >> >> > Hi all,
>>>>> >> >> >
>>>>> >> >> > We have source data with a timestamp field in LONG type to
>>>>> land in an Iceberg table. We want to partition the table with the 
>>>>> timestamp
>>>>> field such that we can query recent data more efficiently. However, LONG 
>>>>> is
>>>>> not supported as the source type of time-based transform (hour, day, etc)
>>>>> >> >> >
>>>>> >> >> > I find the previous discussion
>>>>> https://github.com/apache/iceberg/issues/417 and Ryan suggested two
>>>>> solutions
>>>>> >> >> >
>>>>> >> >> > 1. type promotion from LONG to TIMESTAMP
>>>>> >> >> > 2. custom transform
>>>>> >> >> >
>>>>> >> >> > As I understand it, neither solution has already been
>>>>> implemented yet. Is there any progress in either direction? Which solution
>>>>> does the community prefer? Any other suggestions are also appreciated.
>>>>> >> >> >
>>>>> >> >> > Thanks,
>>>>> >> >> > Manu
>>>>>
>>>>

Reply via email to