Le 02/06/2021 à 14:58, Joris Van den Bossche a écrit :
On Wed, 2 Jun 2021 at 13:56, Antoine Pitrou <anto...@python.org> wrote:


Hello,

For the first time I notice this piece of information about the
timestamp type:

   /// * If the time zone is set to a valid value, values can be
displayed as
   ///   "localized" to that time zone, even though the underlying 64-bit
   ///   integers are identical to the same data stored in UTC. Converting
   ///   between time zones is a metadata-only operation and does not
change the
   ///   underlying values

(from https://github.com/apache/arrow/blob/master/format/Schema.fbs#L223 )

This seems rather weird to me: timestamps always convey a UTC timestamp
value, optionally decorated with a local timezone?  What is the
motivation for such a representation?  It is unlike other systems such
as Python, where a timezone-aware timestamp really expresses a local
time value, not a UTC time value.


Just as reference: pandas uses the same model of storing UTC timestamps for
timezone-aware data (I think numpy also stored it as UTC, before they
removed support for it). And for example, I think also databases like
Postgresql store it as UTC internally, AFAIK.
The Python standard library datetime.datetime indeed stores localized
timestamps. But important difference is that Python actually stores the
year/month/day/hour/etc as separate values, so directly representing an
actual moment in time in a certain timezone. While I think what we store is
considered as "unix time"? (epoch since January 1st, 1970 at UTC) I am not
sure how you would store a timestamp in a certain timezone in this model.

Ah, my bad. I was under the (apparently mistaken) impression that Arrow was the exception here.

Regards

Antoine.

Reply via email to