On Thu, 10 Jun 2021 17:33:23 +0200
Joris Van den Bossche <jorisvandenboss...@gmail.com> wrote:
> 
> We just merged a PR to add some kernels to extract fields from timestamps
> (year, month, day, hour, etc -> ARROW-11759
> <https://github.com/apache/arrow/pull/10176>). But once you start with
> kernels for timestamp data, you quickly run into the question: what to do
> with tz-aware timestamps with a timezone?
> 
> For example, we have:
> - ARROW-12980 <https://issues.apache.org/jira/browse/ARROW-12980> about
> making those kernels to extract timestamp fields timezone aware. For
> example, if you have tz-aware timestamp with hour "09:30:00+02:00", this is
> stored internally as "07:30:00 UTC" (+ the actual timezone as metadata of
> the type). And for a kernel to extract the "hour" field, you want that to
> return 9 and not 7 (which would happen if we use the internal UTC value
> ignoring the timezone information).
> - ARROW-13033 <https://issues.apache.org/jira/browse/ARROW-13033> (which I
> opened today) about adding functionality to convert a tz-naive "local time"
> (local "clock" time in a not-yet-specified time zone) to a properly
> timezone-aware timestamp with the user-specified time zone attached. This
> can be useful to handle data that does not have sufficient timezone
> information attached to the data/type itself, but for which you know what
> the timezone should be. For example, having a timestamp with hour
> "09:30:00" (no explicit timezone, implicitly UTC), but the user knows this
> is actually "09:30:00 CEST", so then you want to convert this to the UTC
> time ("07:30:00Z") that is equivalent to "09:30:00 CEST".

I don't think it's helpful to discuss those two use cases together.
The first case is talking about the semantics of a kernel on valid
timestamp data.
The second case is talking about invalid timestamp data (with values
expressed in a non-UTC timezone).

Regards

Antoine.


Reply via email to