Hi, I would like to draw some attention to a format PR aiming to clarify leap seconds, leap days and daylight saving handling semantics for duration types: https://github.com/apache/arrow/pull/11138.
This came out of the effort [1] trying to implement Partial and Total order for duration type DAY_TIME and MONTH_DAY_NANO. In short, I am proposing we clarify the followings in the spec: * For DAY_TIME duration, similar to Time and Timestamp, we do not take leap seconds into account. But we take daylight saving into account. As a result, days=1,ms=86400000 does not equal to days=2,ms=0. * For MONTH_DAY_NANO, we do not take leap seconds into account. But we take leap days into account. Whether we take leap days into account doesn't really have a big impact here because the number of days in a month already varies even without leap days. A consequence of this is we will not be able to define total order for both DAY_TIME and MONTH_DAY_NANO durations. Similar to floating point values, we will only be able to define partial order for these two types. This impacts downstream sorting compute kernels because we can't simply sort these values by raw ints tuples lexicographically. Another consequence of this is normalization cannot be applied to both types, i.e. we can't normalize days=1,ms=86400000 into days=2 or months=1,days=30 into months=2. This could simplify downstream hash aggregate/join compute kernels because we can just hash the raw int tuples to generate the hash keys. [1]: https://github.com/jorgecarleitao/arrow2/pull/398 Thanks, QP