What about adding a canonical extension type so teams using Arrow don't have to keep re-inventing timestamps and duration types?
Using Decimal128 as storage type for these since we are missing 128-bit integers (another debate). -- Felipe On Sun, Jun 22, 2025 at 9:48 AM Antoine Pitrou <anto...@python.org> wrote: > > The Arrow format defines data types, it doesn't mandate particular > semantics for the operations that one might want to implement them. > > An Arrow implementation could expose different operations (e.g. compute > functions in Arrow C++ parlance) depending on the exact > duration/interval addition desired. > > So I don't understand the need to add more internal/duration types to > Arrow. I'm also extremely lukewarm towards adding new data types to > Arrow. We are already in a situation where the most recent data types > (binary view, list view, run-end encoding) are very poorly supported > accross the ecosystem. > > Regards > > Antoine. > > > > Le 22/06/2025 à 02:34, David Li a écrit : > > MonthDayNano in Arrow uses calendar days, but as noted Iceberg's > proposed DAY_TIME interval is a duration in Arrow parlance. So if you add > "1 day" in Iceberg (which is actually definitionally exactly 86400 seconds) > to a timestamp right before a DST transition, you will be off by an hour > compared to if you did the same in Arrow. > > > > On Sun, Jun 22, 2025, at 04:54, Dewey Dunnington wrote: > >> I may be misunderstanding the MonthDayNano type, but I think it gives a > >> range of roughly +/- INT32_MAX days (5.8 million years?) at nanosecond > >> precision without considering the months component? > >> > >> On Sat, Jun 21, 2025 at 12:58 PM David Li <lidav...@apache.org> wrote: > >> > >>> Hello Arrow devs, > >>> > >>> There's an ongoing discussion in Iceberg [1] and Parquet [2] to define > and > >>> standardize new interval types. Of course, it would be ideal if these > new > >>> types had a canonical representation in Arrow. While YEAR_MONTH is the > same > >>> as Arrow's month interval, however, DAY_TIME is actually a 128-bit > >>> nanosecond duration and hence I don't think it can be represented by > >>> MonthDayNano or the duration type. > >>> > >>> It might be interesting to consider whether there's some other way to > >>> encode this type in Arrow (or if an extension type should be > considered), > >>> or find a way to define it that would more easily map onto an existing > type > >>> (while still meeting the Iceberg goal of being ANSI SQL compatible, > which > >>> apparently requires +/- 10000 years of range). > >>> > >>> [1]: https://lists.apache.org/thread/65sxmjcfpvbp262dh73v5m4zjdgzt7j1 > >>> [2]: https://github.com/apache/parquet-format/pull/496 > >>> > >>> -David > >