The Arrow format defines data types, it doesn't mandate particular semantics for the operations that one might want to implement them.

An Arrow implementation could expose different operations (e.g. compute functions in Arrow C++ parlance) depending on the exact duration/interval addition desired.

So I don't understand the need to add more internal/duration types to Arrow. I'm also extremely lukewarm towards adding new data types to Arrow. We are already in a situation where the most recent data types (binary view, list view, run-end encoding) are very poorly supported accross the ecosystem.

Regards

Antoine.



Le 22/06/2025 à 02:34, David Li a écrit :
MonthDayNano in Arrow uses calendar days, but as noted Iceberg's proposed DAY_TIME 
interval is a duration in Arrow parlance. So if you add "1 day" in Iceberg 
(which is actually definitionally exactly 86400 seconds) to a timestamp right before a 
DST transition, you will be off by an hour compared to if you did the same in Arrow.

On Sun, Jun 22, 2025, at 04:54, Dewey Dunnington wrote:
I may be misunderstanding the MonthDayNano type, but I think it gives a
range of roughly +/- INT32_MAX days (5.8 million years?) at nanosecond
precision without considering the months component?

On Sat, Jun 21, 2025 at 12:58 PM David Li <lidav...@apache.org> wrote:

Hello Arrow devs,

There's an ongoing discussion in Iceberg [1] and Parquet [2] to define and
standardize new interval types. Of course, it would be ideal if these new
types had a canonical representation in Arrow. While YEAR_MONTH is the same
as Arrow's month interval, however, DAY_TIME is actually a 128-bit
nanosecond duration and hence I don't think it can be represented by
MonthDayNano or the duration type.

It might be interesting to consider whether there's some other way to
encode this type in Arrow (or if an extension type should be considered),
or find a way to define it that would more easily map onto an existing type
(while still meeting the Iceberg goal of being ANSI SQL compatible, which
apparently requires +/- 10000 years of range).

[1]: https://lists.apache.org/thread/65sxmjcfpvbp262dh73v5m4zjdgzt7j1
[2]: https://github.com/apache/parquet-format/pull/496

-David

Reply via email to