Sgtm, I think a PMC member needs to kick it off? On Wednesday, April 3, 2019, Wes McKinney <wesmck...@gmail.com> wrote:
> Agreed > > On Wed, Apr 3, 2019 at 9:53 AM Jacques Nadeau <jacq...@apache.org> wrote: > > > > Option 1 sounds good to me. Let's take to a vote. > > > > On Tue, Apr 2, 2019 at 8:53 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > >> > >> Based on the discussion so far, my attempt at concrete Schema proposals > >> below. Jacques I think summarizes what we've discussed, apologies if > >> I've misunderstood. Wes would Option 1 work to support the Pandas Time > >> Delta use-case? I'm leaning towards Option 1 if it satisfies everyone > (but > >> happy to implement whatever we come to a consensus on). > >> > >> ** Option 1: New Type: ** > >> /// An absolute length of time unrelated to any calendar artifacts. For > >> the purposes > >> /// of Arrow Implementations, adding this value to a Timestamp ("t1") > >> naively (i.e. simply summing > >> /// the two number) is acceptable even though in some cases the > resulting > >> Timestamp (t2) would > >> /// not account for leap-seconds during the elapsed time between "t1" > and > >> "t2". Similarly, representing > >> /// the difference between two Unix timestamp is acceptable, but would > >> yield a value that is possibly a few seconds > >> /// off from the true elapsed time. > >> /// > >> /// The resolution defaults to > >> /// millisecond, but can be any of the other supported TimeUnit values > as > >> /// with Timestamp and Time types. This type is always represented as > >> /// an 8-byte integer. > >> table DurationInterval { > >> unit: TimeUnit = MILLISECOND; > >> } > >> > >> ** Option 2: New TimeDelta enum on Interval Unit (strong definition > around > >> leap-seconds): ** > >> > >> enum IntervalUnit: short { YEAR_MONTH, DAY_TIME, TIME_DELTA} > >> // A "calendar" interval which models types that don't necessarily > >> // have a precise duration without the context of a base timestamp (e.g. > >> // days can differ in length during day light savings time transitions). > >> In the case > >> // of TimeDelta it is possible no precise definition is possible if the > >> base timestamp occurs > >> // at an instant when a leap second was added (but would only differ by > at > >> most 1 second). > >> // YEAR_MONTH - Indicates the number of elapsed whole months, stored as > >> // 4-byte integers. > >> // DAY_TIME - Indicates the number of elapsed days and milliseconds, > >> // stored as 2 contiguous 32-bit integers (8-bytes in total). Support > >> // of this IntervalUnit is not required for full arrow compatibility. > >> // TIME_DELTA - Indicates absolute time difference between Unix > Timstamps > >> (i.e. excluding leap seconds). This value is always represented as an > >> 8-byte integer. > >> table Interval { > >> unit: IntervalUnit; > >> resolution: TimeUnit // Only relevant for TIME_DELTA > >> } > >> > >> On Tue, Apr 2, 2019 at 10:03 AM Wes McKinney <wesmck...@gmail.com> > wrote: > >> > >> > Since there were some mentions of leap seconds: > >> > > >> > I think the intent of the timedelta/duration type should be to express > >> > the difference between UNIX timestamps (from second to nanosecond > >> > resolution), which don't include leap seconds. We use the > >> > timedelta64[ns] type in pandas for example, which is a > >> > nanosecond-resolution difference of UNIX timestamps. > >> > > >> > On Tue, Apr 2, 2019 at 10:05 AM Jacques Nadeau <jacq...@apache.org> > wrote: > >> > > > >> > > > > >> > > > I could go either way, it has some benefits for forward > compatibility I > >> > > > suppose, but on the other hand YAGNI, if you feel strongly, I'm ok > >> > > > including it. However, the more optional fields we have for a > specific > >> > > > enum value, makes me lean more towards a new type instead of just > an > >> > enum. > >> > > > > >> > > I'm okay with skipping for now. Appreciate the focus on only what we > >> > > actually need. > >> > > > >> > > > >> > > > >> > > > Could you elaborate on defining standard arithmetic conversions > between > >> > > > time-delta/duration in seconds and other time unit (days, months, > >> > years) as > >> > > > part of the standard/format, I'm still not sure I understand what > the > >> > > > use-case is here. > >> > > > > >> > > > >> > > Here goes nothing... > >> > > > >> > > Seems like there are two options for durations: > >> > > 1) they aren't related to any other type > >> > > 2) they have a relationship to timestamps and dates. > >> > > > >> > > If 1, then the only thing I could understand is real world duration > how > >> > > seconds are defined (and fractions thereof). E.g. [1] :D. In this > >> > > situation, there is no way to express any unit of time of higher > >> > > granularity than a second (e.g. days) since it is up to application > >> > > implementer to define the relationship. This severely limits the > >> > > expressiveness of the concept. (I can't ever use something > TimeUnit.DAYS) > >> > > and stops the ability to cover the existing interval YEAR_MONTH > type I > >> > > believe (since it has a resolution of months). > >> > > > >> > > If 2, then we must define the canonical value of ts + duration, > otherwise > >> > > duration are somewhat meaningless, thus the proposed translation > chart > >> > > (which causes its own oddities depending on the resolution of the > time > >> > type > >> > > you are adding to). > >> > > > >> > > That being said, having started to remember previous discussions on > this, > >> > > I'm most inclined to simply pick #1 and ignore the need for anything > >> > more. > >> > > The curiousness of interval math in database systems underscores > the fact > >> > > that it apparently doesn't matter that much. In most cases, today + > 3 > >> > > months is close enough to today + 90 days for government work. > >> > > > >> > > Let's +2 a patch and get it merged quickly so we never have to think > >> > about > >> > > this again :) > >> > > > >> > > [1] "the duration of 9,192,631,770 periods > >> > > <https://en.wikipedia.org/wiki/Frequency> of the radiation > >> > corresponding to > >> > > the transition between the two hyperfine levels > >> > > <https://en.wikipedia.org/wiki/Hyperfine_structure> of the ground > state > >> > of > >> > > the caesium-133 <https://en.wikipedia.org/wiki/Caesium-133> atom" > (at a > >> > > temperature of 0 K <https://en.wikipedia.org/wiki/Absolute_zero>) > >> > > > >> > > > > >> > >