Hi all, There was recently a discussion on the interpretation of the spec about the "timezone" field of timestamp type (and different timestamp-related types that Arrow should have). See https://lists.apache.org/thread.html/r017084eed74edbc95810fc049056570f45b0bb034d6eeadd647e8621%40%3Cdev.arrow.apache.org%3E Somewhat related, I want to start a discussion to what extent we want to implement functionality (compute kernels) in Arrow C++ to deal with timezones.
We just merged a PR to add some kernels to extract fields from timestamps (year, month, day, hour, etc -> ARROW-11759 <https://github.com/apache/arrow/pull/10176>). But once you start with kernels for timestamp data, you quickly run into the question: what to do with tz-aware timestamps with a timezone? For example, we have: - ARROW-12980 <https://issues.apache.org/jira/browse/ARROW-12980> about making those kernels to extract timestamp fields timezone aware. For example, if you have tz-aware timestamp with hour "09:30:00+02:00", this is stored internally as "07:30:00 UTC" (+ the actual timezone as metadata of the type). And for a kernel to extract the "hour" field, you want that to return 9 and not 7 (which would happen if we use the internal UTC value ignoring the timezone information). - ARROW-13033 <https://issues.apache.org/jira/browse/ARROW-13033> (which I opened today) about adding functionality to convert a tz-naive "local time" (local "clock" time in a not-yet-specified time zone) to a properly timezone-aware timestamp with the user-specified time zone attached. This can be useful to handle data that does not have sufficient timezone information attached to the data/type itself, but for which you know what the timezone should be. For example, having a timestamp with hour "09:30:00" (no explicit timezone, implicitly UTC), but the user knows this is actually "09:30:00 CEST", so then you want to convert this to the UTC time ("07:30:00Z") that is equivalent to "09:30:00 CEST". Both such kernels require a conversion between "UTC time" and tz-naive "local time" (C++ local_t <https://en.cppreference.com/w/cpp/chrono/local_t>), which requires looking up the offset for the given timezone at that time point (the first example requires conversion from UTC to local time, the second from local time to UTC time). Personally, I think such kernels that can handle timezones are important (if we want that users store tz-aware data in Arrow), but I want to ensure we are generally OK with expanding the scope of Arrow to actually start doing something with the tz information of the timestamp type (up to now we just store that value in the type but not yet ever interpret it). Which means dealing with timezone offsets, timezone databases etc. But luckily, the date.h (https://github.com/HowardHinnant/date) we vendor already includes all the required functionality. Best, Joris