I opened this patch over 2 months ago to add some additional metadata for intervals:
https://github.com/apache/arrow/pull/920 Java supports a two-component DAY_TIME interval type as a combo of days and milliseconds: https://github.com/apache/arrow/blob/402baa4ec391b61dd37c770ae7978d51b9b550fa/java/vector/src/main/codegen/data/ValueVectorTypes.tdd#L106 I propose that we change the interval representation to be a number of elapsed units of time from a particular point in time. This unit choices would be the same as our unit for timestamps, so an interval can be viewed as a delta between two timestamps of some resolution (second through nanoseconds) [1]. As context, a number of systems I have worked with deal in absolute time deltas. In pandas, for example, the difference of timestamps (datetime64 values) is a timedelta: In [1]: import pandas as pd In [2]: dr1 = pd.date_range('1/1/2000', periods=5) In [3]: dr2 = pd.date_range('1/2/2000', periods=5) In [4]: dr1 - dr2 Out[4]: TimedeltaIndex(['-1 days', '-1 days', '-1 days', '-1 days', '-1 days'], dtype='timedelta64[ns]', freq=None) In [5]: (dr1 - dr2).values Out[5]: array([-86400000000000, -86400000000000, -86400000000000, -86400000000000, -86400000000000], dtype='timedelta64[ns]') We need to be able to represent this data coherently (up to nanosecond resolution) with the Arrow metadata, and we will also at some point need to perform analytics directly on this data type. An alternative proposal to changing the DAY_TIME interval representation is to add another kind of interval type, so instead of only YEAR_MONTH and DAY_TIME, we have TIMEDELTA. The downside of this, of course, is the extra implementation complexity. DAY_TIME with the current Java representation also seems to me to be a subset of what you can represent with TIMEDELTA. It would be great to make a decision about this so we can get this metadata finalized in the 0.8.0 release. Thanks Wes [1]: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L135