Re: [DISCUSS] Proposal: Split INTERVAL into INTERVAL_DATE, INTERVAL_TIME, INTERVAL_DATETIME

Wail Alkowaileet Wed, 21 Jan 2026 23:15:11 -0800

Thanks Ritik!

I just want to point out that we do something similar to the "duration" type
<https://asterixdb.apache.org/docs/0.9.9/datamodel.html#PrimitiveTypesDuration>
:


> There are also two sub-duration types, namely year_month_duration and
> day_time_duration. year_month_duration represents only the years and
> months of a duration, while day_time_duration represents only the day to
> millisecond fields. Different from the duration type, both these two
> subtypes are totally ordered, so they can be used for comparison and index
> construction.


Though, it is hard to define the semantics of ordering and comparison here
(or even possible). But from a column perspective, it is a bit inefficient,
as we stuff different value types into the same column (date, time and
datetime). This might throw off the encoder trying to squeeze the bits of
different values (i.e., time, date, and datetime).

On a different note, I can see the benefit of such composite types (e.g.,
interval and point). However, users usually model such data as separate
columns (at least that's what I've seen in the wild). For example, (lat,
lon) as opposed to point(x, y) or (start, end) as opposed to interval(s,
e). This separation can be beneficial - though - from a performance
perspective, I can read one value instead of two each time I need just one
of them (e.g., the top three interval ends that have the largest values).
Also, I can have the min/max filters whenever I want to do a comparison
(e.g., searching for all intervals the start after a given point of time OR
all points below the equator). But that comes with the overhead of
maintaining two different columns and the ability to have semantically
wrong values (e.g., an interval without an end or a point that is missing
the latitude value).

On Wed, Jan 21, 2026 at 6:14 AM Ritik Raj <[email protected]> wrote:

> Hi Team,
>
> I’d like to start a discussion around the current design of the INTERVAL
> type in AsterixDB and propose splitting it into three distinct types:
>
>
>    -
>
>    INTERVAL_DATE
>    -
>
>    INTERVAL_TIME
>    -
>
>    INTERVAL_DATETIME
>
>
> *Background*
>
> Today, INTERVAL is effectively an overloaded type whose semantics depend on
> the underlying endpoint types (DATE, TIME, or DATETIME). This is visible,
> for example, in AIntervalConstructorDescriptor, where the interval’s
> behavior and internal representation are determined dynamically based on
> the serialized type tag of the inputs:
>
> ```
>
> switch (intervalType) {
>     case DATE:
>         intervalStart = ADateSerializerDeserializer.getChronon(...);
>         intervalEnd   = ADateSerializerDeserializer.getChronon(...);
>         break;
>     case TIME:
>         intervalStart = ATimeSerializerDeserializer.getChronon(...);
>         intervalEnd   = ATimeSerializerDeserializer.getChronon(...);
>         break;
>     case DATETIME:
>         intervalStart = ADateTimeSerializerDeserializer.getChronon(...);
>         intervalEnd   = ADateTimeSerializerDeserializer.getChronon(...);
>         break;
>     ...
> }
>
> ```
>
> As a result:
>
>
>    -
>
>    A single INTERVAL type can represent *date intervals*, *time intervals*,
>    or *datetime intervals*
>    -
>
>    The physical width of endpoints differs (DATE/TIME are 4 bytes, DATETIME
>    is 8 bytes)
>    -
>
>    Semantics such as ordering, comparison, and statistics are inherently
>    type-dependent
>
> *Motivation*
>
> This overloading creates several challenges:
>
>
>    1.
>
>    *Comparability and ordering*
>
>    -
>
>       Intervals are only meaningfully comparable when their endpoint
>       domains match
>       -
>
>       A generic INTERVAL type prevents us from expressing this at the type
>       level
>
>    2.
>
>    *Optimizer & storage implications*
>
>    -
>
>       Min/max statistics and ordering assumptions are unclear or unsafe for
>       mixed-interval semantics
>       -
>
>       Filter pushdown and reasoning become more complex than necessary
>
>    3.
>
>    *Type safety & clarity*
>
>    -
>
>       The interval’s actual semantics are implicit, not explicit
>
>
> Conceptually, INTERVAL today behaves like three distinct types sharing a
> constructor, rather than a single coherent type.
>
>
> *Proposal*
>
> Introduce three explicit interval types:
>
>
>    -
>
>    INTERVAL_DATE → interval between DATE values
>    -
>
>    INTERVAL_TIME → interval between TIME values
>    -
>
>    INTERVAL_DATETIME → interval between DATETIME values
>
> Each would:
>
>
>    -
>
>    Have well-defined ordering and comparison semantics within its domain
>    -
>
>    Make type errors visible earlier and simplify reasoning across the
> engine
>
>
> I’m happy to prototype if there’s agreement on the direction.
>
> Looking forward to feedback and discussion.
>
> Best regards,
>
> Ritik
>


-- 

*Regards,*
Wail Alkowaileet

Re: [DISCUSS] Proposal: Split INTERVAL into INTERVAL_DATE, INTERVAL_TIME, INTERVAL_DATETIME

Reply via email to