As I understand it, the proposal is to have both an interval data type[1] and a 
timedelta type[2].  The interval is compatible with the SQL standard (but not 
Postgres) and can be implemented with a single numeric value representing a 
particular time unit (year, month, day, hour, minute, second, and possibly 
fractional seconds); timedelta is an array of numeric values, one for a set of 
time units.

I think we should have both, and operators to convert between them. Interval is 
certainly efficient, and is what some applications need, but some applications 
need timedelta.

Julian

[1] https://issues.apache.org/jira/browse/ARROW-352 
<https://issues.apache.org/jira/browse/ARROW-352>

[2] https://issues.apache.org/jira/browse/ARROW-835 
<https://issues.apache.org/jira/browse/ARROW-835>

> On Nov 4, 2017, at 1:26 PM, Wes McKinney <wesmck...@gmail.com> wrote:
> 
> It seems like we don't have enough input on this topic to make a
> decision right now. I placed the JIRA ARROW-352 in the 0.9.0
> milestone, but we really should try to get this done soon so that
> downstream users are not blocked on using Arrow to send around
> interval data.
> 
> - Wes
> 
> On Fri, Oct 20, 2017 at 12:34 AM, Li Jin <ice.xell...@gmail.com> wrote:
>> +1 on this one.
>> 
>> My reason is this makes timestamp/interval calculation faster, i.e,
>> "timestamp + interval < timestamp" should be faster without dealing with
>> two component in interval. Although I am not quite sure about the rational
>> behind the two component representation, which seems to be what is used in
>> Spark:
>> 
>> https://github.com/apache/spark/blob/master/common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
>> 
>> I am interested in hearing reasoning behind two component.
>> 
>> On Wed, Oct 18, 2017 at 8:32 PM, Wes McKinney <wesmck...@gmail.com> wrote:
>> 
>>> I opened this patch over 2 months ago to add some additional metadata
>>> for intervals:
>>> 
>>> https://github.com/apache/arrow/pull/920
>>> 
>>> Java supports a two-component DAY_TIME interval type as a combo of
>>> days and milliseconds:
>>> 
>>> https://github.com/apache/arrow/blob/402baa4ec391b61dd37c770ae7978d
>>> 51b9b550fa/java/vector/src/main/codegen/data/ValueVectorTypes.tdd#L106
>>> 
>>> I propose that we change the interval representation to be a number of
>>> elapsed units of time from a particular point in time. This unit
>>> choices would be the same as our unit for timestamps, so an interval
>>> can be viewed as a delta between two timestamps of some resolution
>>> (second through nanoseconds) [1].
>>> 
>>> As context, a number of systems I have worked with deal in absolute
>>> time deltas. In pandas, for example, the difference of timestamps
>>> (datetime64 values) is a timedelta:
>>> 
>>> In [1]: import pandas as pd
>>> 
>>> In [2]: dr1 = pd.date_range('1/1/2000', periods=5)
>>> 
>>> In [3]: dr2 = pd.date_range('1/2/2000', periods=5)
>>> 
>>> In [4]: dr1 - dr2
>>> Out[4]: TimedeltaIndex(['-1 days', '-1 days', '-1 days', '-1 days',
>>> '-1 days'], dtype='timedelta64[ns]', freq=None)
>>> 
>>> In [5]: (dr1 - dr2).values
>>> Out[5]:
>>> array([-86400000000000, -86400000000000, -86400000000000, -86400000000000,
>>>       -86400000000000], dtype='timedelta64[ns]')
>>> 
>>> We need to be able to represent this data coherently (up to nanosecond
>>> resolution) with the Arrow metadata, and we will also at some point
>>> need to perform analytics directly on this data type.
>>> 
>>> An alternative proposal to changing the DAY_TIME interval
>>> representation is to add another kind of interval type, so instead of
>>> only YEAR_MONTH and DAY_TIME, we have TIMEDELTA. The downside of this,
>>> of course, is the extra implementation complexity. DAY_TIME with the
>>> current Java representation also seems to me to be a subset of what
>>> you can represent with TIMEDELTA.
>>> 
>>> It would be great to make a decision about this so we can get this
>>> metadata finalized in the 0.8.0 release.
>>> 
>>> Thanks
>>> Wes
>>> 
>>> [1]: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L135
>>> 

Reply via email to