> We are recommending that the behavior of
> these functions should consistently have the UTC interpretation of the
> value rather than using the system locale. This is what Python does
> with "tz-naive" datetime.datetime objects

This is not quite true, although perhaps my reading is incorrect.  I
read that as "Python functions treat a naive timestamp as if it were a
UTC timestamp."  Python does not treat a naive timestamp the same as a
UTC timestamp.  And I think this is the heart of what Julilan's point
is (which I agree with).  For example, consider this snippet:

>>> import datetime
>>> import pytz
>>> x = datetime.datetime.now()
>>> y = pytz.utc.localize(x)
>>> x - y
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't subtract offset-naive and offset-aware datetimes

x is not assumed to be UTC (If it were I would get
datetime.timedelta(0) instead of an exception).  Another example:

>>> x.isoformat()
'2021-06-04T09:09:18.304640'
>>> y.isoformat()
'2021-06-04T09:09:18.304640+00:00'

On Fri, Jun 4, 2021 at 7:46 AM Julian Hyde <jhyde.apa...@gmail.com> wrote:
>
> The learning there is: library software shouldn’t use anything from its 
> environment (time zone, locale, encoding, endianness). Functions that use 
> time zone should always have a time zone parameter.
>
> Once you take that step, the functions that work with zoneless timestamps 
> start to look different to functions that work with local timestamps, and you 
> start to realize that they should be separate data types.
>
> > On Jun 3, 2021, at 12:26 PM, Wes McKinney <wesmck...@gmail.com> wrote:
> >
> > Arrow's decision was not to permit storage of timestamps with
> > "localized" representation (which is distinct from UTC internal
> > representation with a different time zone set). The problem really
> > comes down to the interpretation of "time zone naive" timestamps on
> > different systems: operations in my opinion should not yield different
> > results depending on the particular locale of the system where the
> > operations are being run.
> >
> > date on my Linux system returns 1622748048, which is 19:21 UTC. If you
> > encounter 1622748048 without any given time zone, and want to
> > interpret 1622748048 as CDT (US/Central where I live), then Arrow is
> > asking you to localize that timestamp to the UTC representation of
> > 19:21 CDT, which is 7 hours later, so you need to add 7 hours of
> > seconds to the timestamp to adjust it to UTC.
> >
> > In some systems, if you encounter 1622748048 without time zone
> > indicated, the behavior of timestamp_day() or timestamp_hour() will
> > depend on the system locale. We are recommending that the behavior of
> > these functions should consistently have the UTC interpretation of the
> > value rather than using the system locale. This is what Python does
> > with "tz-naive" datetime.datetime objects — if you call access
> > datetime.hour on a timezone-less datetime.datetime, it will return the
> > same result no matter where in the world you are.
> >
> > On Thu, Jun 3, 2021 at 1:19 PM Julian Hyde <jhyde.apa...@gmail.com> wrote:
> >>
> >> It seems that Arrow’s timestamp type can either have no time zone or be 
> >> UTC. I think that is a flawed design, because doesn’t catch user errors.
> >>
> >> Suppose you want to find the number of milliseconds between two 
> >> timestamps. If the first has a timezone and the second is implicitly UTC, 
> >> then you can convert them both to instants and subtract. But if the first 
> >> has a timezone and the second has no time zone, you must supply a time 
> >> zone for the second. So, the subtraction function will have a different 
> >> signature.
> >>
> >> There are many similar operations, where a time zone needs to be supplied, 
> >> or where you cannot safely mix timestamps with different time zones.
> >>
> >> Julian
> >>
> >>
> >>> On Jun 3, 2021, at 11:07 AM, Adam Hooper <a...@adamhooper.com> wrote:
> >>>
> >>> On Thu, Jun 3, 2021 at 2:02 PM Adam Hooper <a...@adamhooper.com> wrote:
> >>>
> >>>> I understand isAdjustedToUTC=true to mean "timestamp", and
> >>>> isAdjustedToUTC=false to mean, "int64 and I hope somebody attached some
> >>>> docs because
> >>>> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#local-semantics-timestamps-not-normalized-to-utc
> >>>> lists a whole slew of potential meanings and without extra metadata I'll
> >>>> never be able to figure out what this column means."
> >>>>
> >>>
> >>> Correcting myself here: Parquet isAdjustedToUTC=false does have just one
> >>> meaning. It means encoding a "(year, month, day, hour, minute, second,
> >>> microsecond)" tuple as a single integer.
> >>>
> >>> Adam
> >>>
> >>> --
> >>> Adam Hooper
> >>> +1-514-882-9694
> >>> http://adamhooper.com
> >>
>

Reply via email to