Hi,

I don't know about Pandas, but the question about timestamp precision is
interesting to me nonetheless.
At Starburst, we've had customer asking for nanosecond timestamp precision,
and this drove adding that capability to Trino.
(Actually, picosecond timestamp precision was implemented, but I am not
aware of any use cases explicitly benefiting from the increased precision.
It still good to be more future-proof, just in case)

Iceberg currently supports timestamps with microsecond precision.

Quick search over mailing list brings a suggestion/workaround to store such
values as two separate numeric fields (e.g. seconds and nanos-of-second).
This is doable for when number of columns affected is small, and when
having full control over the schema, but can be hard to sell to analysts
and BI users.

Are there plans to add timestamps with nanosecond precision?
Would that be separate timestamp type, or would we rather make timestamp
type parametric?
Or, maybe we just extend maximum precision of the timestamp type?

Regarding implementation considerations: for engines like Trino, it's
beneficial to know maximum precision of the data, since
microsecond-precision timestamps can be stored as 64-bit number, and thus
handled more efficiently than nanosecond-precision timestamps.

BR
PF



On Fri, Dec 3, 2021 at 9:52 PM Mayur Srivastava <
mayur.srivast...@twosigma.com> wrote:

> Hi,
>
>
>
> Is there a best practice for handling the pandas.Timestamps (or
> numpy.datetime64) in nanos in Iceberg? How are the Python users working
> with the timestamps in nanos precision, especially if is a part of the
> PartitionSpec?
>
>
>
> Thanks,
>
> Mayur
>
>
>

Reply via email to