> We are recommending that the behavior of > these functions should consistently have the UTC interpretation of the > value rather than using the system locale. This is what Python does > with "tz-naive" datetime.datetime objects
This is not quite true, although perhaps my reading is incorrect. I read that as "Python functions treat a naive timestamp as if it were a UTC timestamp." Python does not treat a naive timestamp the same as a UTC timestamp. And I think this is the heart of what Julilan's point is (which I agree with). For example, consider this snippet: >>> import datetime >>> import pytz >>> x = datetime.datetime.now() >>> y = pytz.utc.localize(x) >>> x - y Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: can't subtract offset-naive and offset-aware datetimes x is not assumed to be UTC (If it were I would get datetime.timedelta(0) instead of an exception). Another example: >>> x.isoformat() '2021-06-04T09:09:18.304640' >>> y.isoformat() '2021-06-04T09:09:18.304640+00:00' On Fri, Jun 4, 2021 at 7:46 AM Julian Hyde <jhyde.apa...@gmail.com> wrote: > > The learning there is: library software shouldn’t use anything from its > environment (time zone, locale, encoding, endianness). Functions that use > time zone should always have a time zone parameter. > > Once you take that step, the functions that work with zoneless timestamps > start to look different to functions that work with local timestamps, and you > start to realize that they should be separate data types. > > > On Jun 3, 2021, at 12:26 PM, Wes McKinney <wesmck...@gmail.com> wrote: > > > > Arrow's decision was not to permit storage of timestamps with > > "localized" representation (which is distinct from UTC internal > > representation with a different time zone set). The problem really > > comes down to the interpretation of "time zone naive" timestamps on > > different systems: operations in my opinion should not yield different > > results depending on the particular locale of the system where the > > operations are being run. > > > > date on my Linux system returns 1622748048, which is 19:21 UTC. If you > > encounter 1622748048 without any given time zone, and want to > > interpret 1622748048 as CDT (US/Central where I live), then Arrow is > > asking you to localize that timestamp to the UTC representation of > > 19:21 CDT, which is 7 hours later, so you need to add 7 hours of > > seconds to the timestamp to adjust it to UTC. > > > > In some systems, if you encounter 1622748048 without time zone > > indicated, the behavior of timestamp_day() or timestamp_hour() will > > depend on the system locale. We are recommending that the behavior of > > these functions should consistently have the UTC interpretation of the > > value rather than using the system locale. This is what Python does > > with "tz-naive" datetime.datetime objects — if you call access > > datetime.hour on a timezone-less datetime.datetime, it will return the > > same result no matter where in the world you are. > > > > On Thu, Jun 3, 2021 at 1:19 PM Julian Hyde <jhyde.apa...@gmail.com> wrote: > >> > >> It seems that Arrow’s timestamp type can either have no time zone or be > >> UTC. I think that is a flawed design, because doesn’t catch user errors. > >> > >> Suppose you want to find the number of milliseconds between two > >> timestamps. If the first has a timezone and the second is implicitly UTC, > >> then you can convert them both to instants and subtract. But if the first > >> has a timezone and the second has no time zone, you must supply a time > >> zone for the second. So, the subtraction function will have a different > >> signature. > >> > >> There are many similar operations, where a time zone needs to be supplied, > >> or where you cannot safely mix timestamps with different time zones. > >> > >> Julian > >> > >> > >>> On Jun 3, 2021, at 11:07 AM, Adam Hooper <a...@adamhooper.com> wrote: > >>> > >>> On Thu, Jun 3, 2021 at 2:02 PM Adam Hooper <a...@adamhooper.com> wrote: > >>> > >>>> I understand isAdjustedToUTC=true to mean "timestamp", and > >>>> isAdjustedToUTC=false to mean, "int64 and I hope somebody attached some > >>>> docs because > >>>> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#local-semantics-timestamps-not-normalized-to-utc > >>>> lists a whole slew of potential meanings and without extra metadata I'll > >>>> never be able to figure out what this column means." > >>>> > >>> > >>> Correcting myself here: Parquet isAdjustedToUTC=false does have just one > >>> meaning. It means encoding a "(year, month, day, hour, minute, second, > >>> microsecond)" tuple as a single integer. > >>> > >>> Adam > >>> > >>> -- > >>> Adam Hooper > >>> +1-514-882-9694 > >>> http://adamhooper.com > >> >