On Tue, 15 Jun 2021 at 10:11, Antoine Pitrou <anto...@python.org> wrote: > > > Le 15/06/2021 à 09:31, Joris Van den Bossche a écrit : > > > > (but I also don't fully understand your point here, as your "they > > would get the correct histogram" seems to imply a positive statemenent > > for tz-naive timestamps, while your email starts with a +1 on > > Antoine's proposal which, as far as I understand it, says that > > timestamps without timezone are useless / should be interpreted as UTC > > instead (which makes your above described scenario impossible)). > > My proposal is that timestamps without timezone should be interpreted as > UTC. I don't get how that makes them "useless". In my view, that makes > them far more useful than if we don't know their base of reference > (because then most operations you can do on them will give > uninterpretable data). >
Note that the "useless" was your wording about my interpretation of timestamps without timezone as "unknown local timezone" (so my above statement should probably have been phrased as ".. are either useless or should be interpreted as UTC"). So I didn't want to imply that interpreting timestamps without timezone as UTC is useless. That's certainly a clear interpretation (and a useful abstraction, given earlier references to Java's "instant" which is kind of similar AFAIU), but it's a *different* interpretation as how I understand the current spec, and changing our interpretation has consequences. First, there are systems that have the notion of tz-naive local timestamps / TIMESTAMP WITHOUT TIMEZONE (and without interpreting it as UTC). Some examples I am aware of are pandas, most database systems (although with varying names), Jodatime's LocalDateTime, etc. If we want to support those systems, Arrow needs to have an equivalent timezone-less type. To quote Wes from his last email about dropping the timezone-less timestamp: "I don't think that is something we can do at this time lest we lose the ability to have high-fidelity interoperability with other systems." In addition, I will continue to argue that, depending on your application, it *can* be reasonable to work with timestamps without a timezone. Certainly, such timestamps don't contain information about the absolute time point, and thus are inherently ambiguous for certain operations. But as Wes mentioned before, there are still many analytical operations that you can do on timezone-less data without any problem or ambiguity (such as aggregating by year or month, or even the hour of the day). Joris