Le 02/06/2021 à 14:58, Joris Van den Bossche a écrit :
On Wed, 2 Jun 2021 at 13:56, Antoine Pitrou <anto...@python.org> wrote:
Hello,
For the first time I notice this piece of information about the
timestamp type:
/// * If the time zone is set to a valid value, values can be
displayed as
/// "localized" to that time zone, even though the underlying 64-bit
/// integers are identical to the same data stored in UTC. Converting
/// between time zones is a metadata-only operation and does not
change the
/// underlying values
(from https://github.com/apache/arrow/blob/master/format/Schema.fbs#L223 )
This seems rather weird to me: timestamps always convey a UTC timestamp
value, optionally decorated with a local timezone? What is the
motivation for such a representation? It is unlike other systems such
as Python, where a timezone-aware timestamp really expresses a local
time value, not a UTC time value.
Just as reference: pandas uses the same model of storing UTC timestamps for
timezone-aware data (I think numpy also stored it as UTC, before they
removed support for it). And for example, I think also databases like
Postgresql store it as UTC internally, AFAIK.
The Python standard library datetime.datetime indeed stores localized
timestamps. But important difference is that Python actually stores the
year/month/day/hour/etc as separate values, so directly representing an
actual moment in time in a certain timezone. While I think what we store is
considered as "unix time"? (epoch since January 1st, 1970 at UTC) I am not
sure how you would store a timestamp in a certain timezone in this model.
Ah, my bad. I was under the (apparently mistaken) impression that Arrow
was the exception here.
Regards
Antoine.