It looks good to me.

Rok

On Mon, Sep 20, 2021 at 2:36 PM Antoine Pitrou <anto...@python.org> wrote:

>
> All, can you please take a look at QP's PR at
> https://github.com/apache/arrow/pull/11138 ?
>
> I don't believe this requires a vote as this clarification is consistent
> with the already clarified semantics for Time and Timestamp types.  The
> current PR contents are ready for a merge, and I think they can be
> merged soon if nobody opposes.
>
> Regards
>
> Antoine.
>
>
> Le 17/09/2021 à 05:33, QP Hou a écrit :
> > Thank you for your feedback Weston and Antonie. I agree that ordering
> > discussion should be out of scope for the Arrow format spec. I have
> > removed reference of ordering in the PR so now the only change is
> > mentioning leap seconds to keep it consistent with other temporal
> > types.
> >
> > I would like to add that even though we are not explicitly discussing
> > ordering in the spec, any kind of restriction we assign to a type
> > would still implicitly impact ordering in downstream compute kernels.
> > This is why I also took out the discussion of leap days in my PR as
> > well.
> >
> > Thanks,
> > QP
> >
> > On Tue, Sep 14, 2021 at 12:46 AM Antoine Pitrou <anto...@python.org>
> wrote:
> >>
> >>
> >> I agree with Weston that ordering isn't in the scope for the Arrow
> >> format spec (*).  For example, implementations are free to define UTF8
> >> comparisons and ordering as they wish (some may want to invest in the
> >> complexity of the official Unicode collation algorithm, others may be
> >> content with a simple codepoint-wise lexicographic comparison).  It
> >> doesn't prevent them from exchanging UTF8 data unambiguously using
> Arrow.
> >>
> >> (*) It may be in the scope for a hypothetical Compute IR spec, however.
> >>
> >> Regards
> >>
> >> Antoine.
> >>
> >>
> >> Le 14/09/2021 à 07:16, QP Hou a écrit :
> >>> Good point Weston. My proposal was written with the impression that
> >>> Arrow does want to define semantic for some of these temporal types
> >>> based on the existing comments in the Schema.fbs file.
> >>>
> >>> For example, here is a quote taken from the comments for the Time time:
> >>>
> >>> /// This definition doesn't allow for leap seconds. Time values from
> >>> /// measurements with leap seconds will need to be corrected when
> ingesting
> >>> /// into Arrow (for example by replacing the value 86400 with 86399).
> >>>
> >>> Here is another quote for the Date type:
> >>>
> >>> /// * Milliseconds (64 bits) indicating UNIX time elapsed since the
> epoch (no
> >>> /// leap seconds), where the values are evenly divisible by 86400000
> >>>
> >>> For the interval type, we have:
> >>>
> >>> // A "calendar" interval which models types that don't necessarily
> >>> // have a precise duration without the context of a base timestamp
> (e.g.
> >>> // days can differ in length during day light savings time
> transitions).
> >>>
> >>> I think pushing the responsibility to define these semantics to the
> >>> data producer side is also a perfectly fine design with its own
> >>> trade-offs. It would make data exchange between two different systems
> >>> a little bit harder because consumers need to be aware of the
> >>> semantics defined by the producer. On the other hand, it does make the
> >>> producer implementation easier. It also makes data exchange within the
> >>> same system more efficient if that system's temporal type semantic is
> >>> different from what's defined in Arrow's spec.
> >>>
> >>> Either way, I think it would be good if we can be consistent on our
> >>> temporal type semantics in the spec. If we are making the claim that
> >>> leap seconds should not be taken into account for Time, Timestamp and
> >>> Date types, then it seems natural to make this claim for Interval type
> >>> as well. Alternatively, we could update the spec to make all temporal
> >>> types leap seconds agnostics.
> >>>
> >>> On Mon, Sep 13, 2021 at 12:03 PM Weston Pace <weston.p...@gmail.com>
> wrote:
> >>>>
> >>>> One could define a sorting based on 30 days months, 365 day years, and
> >>>> 24 hour days.  It would be consistent but can lead to some surprising
> >>>> results.  It appears that this is what postgres does as I got the
> >>>> following ordering for an interval:
> >>>>
> >>>> 359 days, 12 months, 360 days, 1 year, 365 days, 366 days
> >>>>
> >>>> On the other hand, Joda time forbids comparison of periods (their
> >>>> version of what we call an interval) and offers three ways to convert
> >>>> to a duration.  There is toDurationFrom(instant),
> >>>> toDurationTo(instant) which give durations from specific calendar
> >>>> ranges and then there is toStandardDuration() which converts to a
> >>>> duration based on 24 hour days.  However, toStandardDuration will
> >>>> still fail if the period has >0 months or years (presumably because
> >>>> months and years are too inconsistent).
> >>>>
> >>>> I'm not sure though that this is something that Arrow needs to define.
> >>>> We aren't specifying any invalid ranges of values.  I don't foresee
> >>>> any interoperability concerns.  A system that treated intervals as
> >>>> comparable (and didn't factor in DST, leap years, etc.) will read and
> >>>> write intervals the same way as a system that considers intervals
> >>>> incomparable.
> >>>>
> >>>> This question seems to fall into the "compute" space inhabited by
> >>>> topics like "is 'false && null' a false value or a null value" and
> >>>> "should addition overflow or throw an exception".
> >>>>
> >>>> On Mon, Sep 13, 2021 at 6:23 AM QP Hou <houqp....@gmail.com> wrote:
> >>>>>
> >>>>> On Mon, Sep 13, 2021 at 6:18 AM Antoine Pitrou <anto...@python.org>
> wrote:
> >>>>>> The Duration type is defined with a TimeUnit.  You are probably
> thinking
> >>>>>> about the Interval type.
> >>>>>>
> >>>>>
> >>>>> Oops, my bad, yes, it should be Interval type not Duration.
> >>>>>
> >>>>>> Ok.  How about daylight savings? I suppose they are taken into
> account
> >>>>>> as well.
> >>>>>>
> >>>>>
> >>>>> Yes, the day component in both DAY_TIME and MONTH_DAY_NANO all take
> >>>>> into account of daylight savings.
>

Reply via email to