On Thu, 10 Jun 2021 at 18:06, Antoine Pitrou <anto...@python.org> wrote: > > On Thu, 10 Jun 2021 17:33:23 +0200 > Joris Van den Bossche <jorisvandenboss...@gmail.com> wrote: > > > > We just merged a PR to add some kernels to extract fields from timestamps > > (year, month, day, hour, etc -> ARROW-11759 > > <https://github.com/apache/arrow/pull/10176>). But once you start with > > kernels for timestamp data, you quickly run into the question: what to do > > with tz-aware timestamps with a timezone? > > > > For example, we have: > > - ARROW-12980 <https://issues.apache.org/jira/browse/ARROW-12980> about > > making those kernels to extract timestamp fields timezone aware. For > > example, if you have tz-aware timestamp with hour "09:30:00+02:00", this is > > stored internally as "07:30:00 UTC" (+ the actual timezone as metadata of > > the type). And for a kernel to extract the "hour" field, you want that to > > return 9 and not 7 (which would happen if we use the internal UTC value > > ignoring the timezone information). > > - ARROW-13033 <https://issues.apache.org/jira/browse/ARROW-13033> (which I > > opened today) about adding functionality to convert a tz-naive "local time" > > (local "clock" time in a not-yet-specified time zone) to a properly > > timezone-aware timestamp with the user-specified time zone attached. This > > can be useful to handle data that does not have sufficient timezone > > information attached to the data/type itself, but for which you know what > > the timezone should be. For example, having a timestamp with hour > > "09:30:00" (no explicit timezone, implicitly UTC), but the user knows this > > is actually "09:30:00 CEST", so then you want to convert this to the UTC > > time ("07:30:00Z") that is equivalent to "09:30:00 CEST". > > I don't think it's helpful to discuss those two use cases together. > The first case is talking about the semantics of a kernel on valid > timestamp data. > The second case is talking about invalid timestamp data (with values > expressed in a non-UTC timezone). >
What both cases have in common is that they need to look up timezone offsets to do a conversion and thus require access to a timezone database (and requiring us to deal with things like Windows not having a system tz database available). That was the main aspect I wanted to ensure we are OK with in general ("dealing with timezones"), and less the specifics of the two examples I gave. If that general issue doesn't turn out to be such a discussion point, I think that would be a good start. And then indeed each case where we might want to add timezone handling can be discussed separately (since adding it to a second or third etc kernel is much less of an issue than *starting* to do timezone handling). Joris > Regards > > Antoine. > >