What about adding a canonical extension type so teams using Arrow don't
have to keep re-inventing timestamps and duration types?

Using Decimal128 as storage type for these since we are missing 128-bit
integers (another debate).

--
Felipe

On Sun, Jun 22, 2025 at 9:48 AM Antoine Pitrou <anto...@python.org> wrote:

>
> The Arrow format defines data types, it doesn't mandate particular
> semantics for the operations that one might want to implement them.
>
> An Arrow implementation could expose different operations (e.g. compute
> functions in Arrow C++ parlance) depending on the exact
> duration/interval addition desired.
>
> So I don't understand the need to add more internal/duration types to
> Arrow. I'm also extremely lukewarm towards adding new data types to
> Arrow. We are already in a situation where the most recent data types
> (binary view, list view, run-end encoding) are very poorly supported
> accross the ecosystem.
>
> Regards
>
> Antoine.
>
>
>
> Le 22/06/2025 à 02:34, David Li a écrit :
> > MonthDayNano in Arrow uses calendar days, but as noted Iceberg's
> proposed DAY_TIME interval is a duration in Arrow parlance. So if you add
> "1 day" in Iceberg (which is actually definitionally exactly 86400 seconds)
> to a timestamp right before a DST transition, you will be off by an hour
> compared to if you did the same in Arrow.
> >
> > On Sun, Jun 22, 2025, at 04:54, Dewey Dunnington wrote:
> >> I may be misunderstanding the MonthDayNano type, but I think it gives a
> >> range of roughly +/- INT32_MAX days (5.8 million years?) at nanosecond
> >> precision without considering the months component?
> >>
> >> On Sat, Jun 21, 2025 at 12:58 PM David Li <lidav...@apache.org> wrote:
> >>
> >>> Hello Arrow devs,
> >>>
> >>> There's an ongoing discussion in Iceberg [1] and Parquet [2] to define
> and
> >>> standardize new interval types. Of course, it would be ideal if these
> new
> >>> types had a canonical representation in Arrow. While YEAR_MONTH is the
> same
> >>> as Arrow's month interval, however, DAY_TIME is actually a 128-bit
> >>> nanosecond duration and hence I don't think it can be represented by
> >>> MonthDayNano or the duration type.
> >>>
> >>> It might be interesting to consider whether there's some other way to
> >>> encode this type in Arrow (or if an extension type should be
> considered),
> >>> or find a way to define it that would more easily map onto an existing
> type
> >>> (while still meeting the Iceberg goal of being ANSI SQL compatible,
> which
> >>> apparently requires +/- 10000 years of range).
> >>>
> >>> [1]: https://lists.apache.org/thread/65sxmjcfpvbp262dh73v5m4zjdgzt7j1
> >>> [2]: https://github.com/apache/parquet-format/pull/496
> >>>
> >>> -David
>
>

Reply via email to