Thanks Micah — I commented in the PR. Once we've settled on the details we can come up with an implementation / vote plan
On Tue, Apr 27, 2021 at 1:12 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > To nudge this along I opend up https://github.com/apache/arrow/pull/10177 > > Comments welcome. > > On Sun, Apr 11, 2021 at 9:38 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > >> If there are no more comments on this maybe we should update the original >> RFC PR and ensure we are OK with it in principle (Dmitry do you want to do >> this or should we start a new PR)? I can try to work on the C++/Python and >> Java code in the next few weeks. >> >> >> On Sun, Apr 4, 2021 at 1:35 PM Micah Kornfield <emkornfi...@gmail.com> >> wrote: >> >>> Looking more at the Postgres spec and storage details, I'd be >>>> supportive of having a COMPLEX interval type which could be a packed >>>> type (possibly using the same 16-byte storage layout as Postgres -- >>>> depending on whether this complex interval needs granularity smaller >>>> than seconds, more analysis needed) >>> >>> >>> I agree we seem to be coalescing towards this representation which IIUC >>> is (32 bit months, 32 bit days, 64 bit for Seconds + fractional seconds). >>> I think the main questions here are: >>> >>> 1. Configurable sub-second granularity (I would again lean no and just >>> say nanoseconds, since this is the highest granularity other Arrow time >>> types support). >>> 2. Range compatibility with other interval types (if we fix at >>> nanoseconds IIUC there are some extreme values that couldn't be converted >>> to Arrow). I would guess that these extrema are rare enough that this >>> should not be an issue. >>> >>> I think that adding an entry to >>>> the IntervalUnit enum doesn't pose any forward compatibility problems >>>> (because implementations _should_ be able to recognized an unsupported >>>> unit? >>> >>> >>> I also agree with this, I can do an audit before an official proposal of >>> Java and C++. >>> >>> -Micah >>> >>> >>> >>> >>> >>> On Sun, Apr 4, 2021 at 11:10 AM Wes McKinney <wesmck...@gmail.com> >>> wrote: >>> >>>> Looking more at the Postgres spec and storage details, I'd be >>>> supportive of having a COMPLEX interval type which could be a packed >>>> type (possibly using the same 16-byte storage layout as Postgres -- >>>> depending on whether this complex interval needs granularity smaller >>>> than seconds, more analysis needed). I think that adding an entry to >>>> the IntervalUnit enum doesn't pose any forward compatibility problems >>>> (because implementations _should_ be able to recognized an unsupported >>>> unit?). It isn't wonderful to have 3 different types of intervals >>>> (ideally just SIMPLE and COMPLEX), but it seems like something we can >>>> live with. If we're going to do this, we should try to do it properly >>>> now since I would probably object at adding a 4th interval type if the >>>> need for it came up in the future. >>>> >>>> On Fri, Apr 2, 2021 at 5:37 PM Micah Kornfield <emkornfi...@gmail.com> >>>> wrote: >>>> > >>>> > > >>>> > > However it seems a little unfortunate that there is now way to >>>> represent a >>>> > > "common" interval like "1 week and 1 hour" with native arrow types >>>> > >>>> > >>>> > I might have misunderstood,but at least in postgres, I thought this >>>> boils >>>> > down to "0 months, 7 days, 3600 seconds". Since months is 0, this >>>> seems >>>> > like it fits squarely in the existing interval type Days_Mills. >>>> > >>>> > I thought what can't be represented today is "1 Year 1 Hour". It >>>> seems >>>> > like none of the proposals so far cover weeks as an explicit type? >>>> > >>>> > On Fri, Apr 2, 2021 at 2:42 PM Andrew Lamb <al...@influxdata.com> >>>> wrote: >>>> > >>>> > > I think it is plausible that we use Arrow structs to create a >>>> synthetic >>>> > > interval type for DataFusion (I don't have a compelling usecase to >>>> store >>>> > > the intervals themselves, or to expose them outside of DataFusion). >>>> > > >>>> > > However it seems a little unfortunate that there is now way to >>>> represent a >>>> > > "common" interval like "1 week and 1 hour" with native arrow types >>>> > > >>>> > > >>>> > > >>>> > > On Fri, Apr 2, 2021 at 4:38 PM Micah Kornfield < >>>> emkornfi...@gmail.com> >>>> > > wrote: >>>> > > >>>> > >> The real usecase I have is "postgres compatibility" >>>> > >> >>>> > >> >>>> > >> Yeah, I'm a little conflicted on this. A broader analysis might be >>>> > >> necessary and I'd welcome others thoughts, but at what point >>>> should we >>>> > >> mostly consider the type system closed? Should we be aiming for >>>> full >>>> > >> parity with ANSI SQL/Postgres SQL or something else? >>>> > >> >>>> > >> >>>> > >>> I have no known need for the actual postgres timestamp internal >>>> > >>> representation. >>>> > >> >>>> > >> >>>> > >> I suppose there is an edge case that the seconds range is larger >>>> for >>>> > >> microseconds compared to nanoseconds with the simple >>>> representation. But >>>> > >> that seems minor. >>>> > >> >>>> > >> On Fri, Apr 2, 2021 at 1:25 PM Andrew Lamb <al...@influxdata.com> >>>> wrote: >>>> > >> >>>> > >>> The real usecase I have is "postgres compatibility" - in the >>>> sense that >>>> > >>> we can write SQL queries / expressions that use postgres interval >>>> type [1] >>>> > >>> and corresponding expressions with the full postgres interval >>>> range. I have >>>> > >>> no known need for the actual postgres timestamp internal >>>> representation. >>>> > >>> >>>> > >>> A more vague usecase I would like (but do not strictly need) is >>>> for >>>> > >>> expressing intervals up to the same precision as IOx's underlying >>>> > >>> timestamps (which are stored as nanosecond precision). >>>> > >>> >>>> > >>> [1] >>>> > >>> >>>> https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT >>>> > >>> >>>> > >>> "Internally interval values are stored as months, days, and >>>> seconds. >>>> > >>> This is done because the number of days in a month varies, and a >>>> day can >>>> > >>> have 23 or 25 hours if a daylight savings time adjustment is >>>> involved. The >>>> > >>> months and days fields are integers while the seconds field can >>>> store >>>> > >>> fractions. Because intervals are usually created from constant >>>> strings or >>>> > >>> timestamp subtraction, this storage method works well in most >>>> cases, >>>> > >>> but can cause unexpected results: >>>> > >>> " >>>> > >>> >>>> > >>> On Fri, Apr 2, 2021 at 3:53 PM Micah Kornfield < >>>> emkornfi...@gmail.com> >>>> > >>> wrote: >>>> > >>> >>>> > >>>> Andrew is the use-case you have simply postgres compatibility or >>>> is it >>>> > >>>> more >>>> > >>>> extensive? >>>> > >>>> >>>> > >>>> One potential problem with combining Month and Day fields, is >>>> that the >>>> > >>>> type >>>> > >>>> no longer has a defined sort order (the existing Day-Millisecond >>>> type >>>> > >>>> without assumptions, in particular because I don't think today >>>> there is >>>> > >>>> an >>>> > >>>> explicit constraint on the bounds for the millisecond component). >>>> > >>>> >>>> > >>>> -Micah >>>> > >>>> >>>> > >>>> >>>> > >>>> >>>> > >>>> On Wed, Mar 31, 2021 at 9:03 AM Antoine Pitrou < >>>> anto...@python.org> >>>> > >>>> wrote: >>>> > >>>> >>>> > >>>> > >>>> > >>>> > Le 31/03/2021 à 17:55, Micah Kornfield a écrit : >>>> > >>>> > > Thanks for the feedback. A couple of points here and some >>>> responses >>>> > >>>> > below. >>>> > >>>> > > >>>> > >>>> > > * One other question is whether the Nanoseconds should >>>> actually be >>>> > >>>> > > configurable (i.e. use milliseconds or microseconds). I >>>> would lean >>>> > >>>> > towards >>>> > >>>> > > no. >>>> > >>>> > >>>> > >>>> > Same for me. >>>> > >>>> > >>>> > >>>> > > * I'm also still not 100% convinced we need this as a first >>>> class >>>> > >>>> type in >>>> > >>>> > > arrow or if we should be looking more closely at the Struct >>>> (in the >>>> > >>>> Arrow >>>> > >>>> > > sense) based implementation. In the future where alternative >>>> > >>>> encodings >>>> > >>>> > are >>>> > >>>> > > supported, this could allow for much smaller footprints for >>>> this >>>> > >>>> type. >>>> > >>>> > >>>> > >>>> > Having a "packed" first class type allows for better locality >>>> when >>>> > >>>> > accessing data. It doesn't sound very likely that you'd >>>> access only >>>> > >>>> one >>>> > >>>> > component of the interval. >>>> > >>>> > >>>> > >>>> > But I have no idea how important this is, and temporal >>>> datetypes are >>>> > >>>> > generally cumbersome to add support for (conversions, >>>> arithmetic, >>>> > >>>> etc.), >>>> > >>>> > so it would be nice to avoid adding too many of them :-) >>>> > >>>> > >>>> > >>>> > Regards >>>> > >>>> > >>>> > >>>> > Antoine. >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > > >>>> > >>>> > > The 3 >>>> > >>>> > >> field implementation doesn't seem to have any way to >>>> represent >>>> > >>>> integral >>>> > >>>> > >> days, so I am also not sure about that one. >>>> > >>>> > > >>>> > >>>> > > >>>> > >>>> > > Sorry this was an email gaffe. I intended Month (32 bit >>>> int), Day >>>> > >>>> (32 >>>> > >>>> > bit >>>> > >>>> > > int), Nanosecond (64 bit int). >>>> > >>>> > > >>>> > >>>> > > OTOH I don't really understand the point of supporting "the >>>> most >>>> > >>>> > >> reasonable ranges for Year, Month and Nanoseconds >>>> independently". >>>> > >>>> What >>>> > >>>> > >> does it bring to encode more than one month in the >>>> nanoseconds >>>> > >>>> field? >>>> > >>>> > > >>>> > >>>> > > >>>> > >>>> > > I'm happy with simplicity. In the past there has been some >>>> > >>>> reference to >>>> > >>>> > > people wanting to store very large timestamps (fall out of >>>> > >>>> Nanoseconds >>>> > >>>> > max >>>> > >>>> > > representable value) but we've concluded that this wasn't >>>> something >>>> > >>>> that >>>> > >>>> > we >>>> > >>>> > > wanted to really support. >>>> > >>>> > > >>>> > >>>> > > >>>> > >>>> > > >>>> > >>>> > > >>>> > >>>> > > >>>> > >>>> > > >>>> > >>>> > > On Wed, Mar 31, 2021 at 4:49 AM Antoine Pitrou < >>>> anto...@python.org> >>>> > >>>> > wrote: >>>> > >>>> > > >>>> > >>>> > >> >>>> > >>>> > >> I would favour the following characteristics : >>>> > >>>> > >> - support for nanoseconds (especially as other Arrow >>>> temporal types >>>> > >>>> > >> support it) >>>> > >>>> > >> - easy to handle (which excludes the ZetaSQL representtaion >>>> IMHO) >>>> > >>>> > >> >>>> > >>>> > >> OTOH I don't really understand the point of supporting "the >>>> most >>>> > >>>> > >> reasonable ranges for Year, Month and Nanoseconds >>>> independently". >>>> > >>>> What >>>> > >>>> > >> does it bring to encode more than one month in the >>>> nanoseconds >>>> > >>>> field? >>>> > >>>> > >> You can already use the Duration type for that. >>>> > >>>> > >> >>>> > >>>> > >> Regards >>>> > >>>> > >> >>>> > >>>> > >> Antoine. >>>> > >>>> > >> >>>> > >>>> > >> >>>> > >>>> > >> Le 31/03/2021 à 05:48, Micah Kornfield a écrit : >>>> > >>>> > >>> To follow-up on this conversation I did some analysis on >>>> interval >>>> > >>>> > types: >>>> > >>>> > >>> >>>> > >>>> > >>> >>>> > >>>> > >> >>>> > >>>> > >>>> > >>>> >>>> https://docs.google.com/document/d/1i1E_fdQ_xODZcAhsV11Pfq27O50k679OYHXFJpm9NS0/edit >>>> > >>>> > >> Please feel free to add more details/systems I missed. >>>> > >>>> > >>> >>>> > >>>> > >>> Given the disparate requirements of different systems I >>>> think the >>>> > >>>> > >> following might make sense for official types (if there >>>> isn't >>>> > >>>> > consensus, I >>>> > >>>> > >> might try to contributation extension Array implementations >>>> for >>>> > >>>> them to >>>> > >>>> > >> Java and C++/Python separately). >>>> > >>>> > >>> >>>> > >>>> > >>> 1. 3 fields: Year (32 bit), Month (32 bit), Nanoseconds >>>> (64 bit) >>>> > >>>> all >>>> > >>>> > >> signed. >>>> > >>>> > >>> 2. Postgres representation (Downside is it doesn't support >>>> > >>>> > Nanoseconds, >>>> > >>>> > >> only microseconds). >>>> > >>>> > >>> 3. ZetaSQL implementation (Requires some bit >>>> manipulation) but >>>> > >>>> > supports >>>> > >>>> > >> the most reasonable ranges for Year, Month and Nanoseconds >>>> > >>>> > independently. >>>> > >>>> > >>> >>>> > >>>> > >>> Thoughts? >>>> > >>>> > >>> >>>> > >>>> > >>> Micah >>>> > >>>> > >>> >>>> > >>>> > >>> On 2021/02/18 04:30:55 Micah Kornfield wrote: >>>> > >>>> > >>>>> >>>> > >>>> > >>>>> I didn’t find any page/documentation on how to do RFC in >>>> Arrow >>>> > >>>> > >> protocol, >>>> > >>>> > >>>>> so can anyone point me to it or PR with email will be >>>> enough? >>>> > >>>> > >>>> >>>> > >>>> > >>>> That is enough to start discussion. Before formal >>>> acceptance and >>>> > >>>> > >> merging >>>> > >>>> > >>>> of the PR there needs to be a Java and C++ >>>> implementations for >>>> > >>>> the >>>> > >>>> > type >>>> > >>>> > >>>> that pass integration tests. At the time this guideline >>>> was >>>> > >>>> > instituted >>>> > >>>> > >>>> Java and C++ were considered the "reference" >>>> implementations (I >>>> > >>>> think >>>> > >>>> > >> they >>>> > >>>> > >>>> still have the most complete integration test coverage). >>>> > >>>> > >>>> >>>> > >>>> > >>>> My understanding is that the current modelling of >>>> intervals >>>> > >>>> mimics SQL >>>> > >>>> > >>>> standards (e.g. SQL Server [1]). So it would also be >>>> good to >>>> > >>>> step >>>> > >>>> > back >>>> > >>>> > >> and >>>> > >>>> > >>>> understand what problem DF is trying to solve and how it >>>> differs >>>> > >>>> from >>>> > >>>> > >> other >>>> > >>>> > >>>> SQL implementations. I'd be hesitant to accept COMPLEX >>>> as a new >>>> > >>>> type >>>> > >>>> > >>>> without a much deeper analysis into calendar >>>> representations >>>> > >>>> within >>>> > >>>> > >> Arrow >>>> > >>>> > >>>> and how they relate to other existing systems (e.g. Hive >>>> and some >>>> > >>>> > >>>> assortment of existing SQL databases). For instance the >>>> current >>>> > >>>> > >> modelling >>>> > >>>> > >>>> of timestamps does not lend itself to constructing a >>>> COMPLEX >>>> > >>>> interval >>>> > >>>> > >> type >>>> > >>>> > >>>> particularly well. (Duration was introduced for this >>>> reason). >>>> > >>>> > >>>> >>>> > >>>> > >>>> I think both Wes's suggestion of FixedSizeBinary and >>>> Andrew's of >>>> > >>>> > >> composing >>>> > >>>> > >>>> the with a struct are good stop-gaps. These obviously >>>> have >>>> > >>>> different >>>> > >>>> > >>>> trade-offs. Ultimately, it would be good to define common >>>> > >>>> extension >>>> > >>>> > >> types >>>> > >>>> > >>>> that can represent this use-case if there really is >>>> demand for >>>> > >>>> it (if >>>> > >>>> > it >>>> > >>>> > >>>> doesn't become a top level type). >>>> > >>>> > >>>> >>>> > >>>> > >>>> [1] >>>> > >>>> > >>>> >>>> > >>>> > >> >>>> > >>>> > >>>> > >>>> >>>> https://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/interval-data-types?view=sql-server-ver15 >>>> > >>>> > >>>> >>>> > >>>> > >>>> -Micah >>>> > >>>> > >>>> >>>> > >>>> > >>>> On Wed, Feb 17, 2021 at 2:05 PM Andrew Lamb < >>>> > >>>> al...@influxdata.com> >>>> > >>>> > >> wrote: >>>> > >>>> > >>>> >>>> > >>>> > >>>>> That is a great suggestion Wes, thank you. >>>> > >>>> > >>>>> >>>> > >>>> > >>>>> I wonder if we could get away with a 128 bit >>>> representation >>>> > >>>> that is >>>> > >>>> > the >>>> > >>>> > >>>>> concatenation of the two existing interval types >>>> > >>>> > (YearMonth)(DayTime). >>>> > >>>> > >> Or >>>> > >>>> > >>>>> maybe even define a `struct` type with those fields that >>>> is >>>> > >>>> used by >>>> > >>>> > >>>>> DataFusion. >>>> > >>>> > >>>>> >>>> > >>>> > >>>>> Basically, given our reading of the Arrow spec[1], it is >>>> > >>>> currently >>>> > >>>> > not >>>> > >>>> > >>>>> possible to precisely represent an interval that has both >>>> > >>>> monthly and >>>> > >>>> > >>>>> sub-montly granularity. >>>> > >>>> > >>>>> >>>> > >>>> > >>>>> As Dmtry says, if you have an interval seemingly simple >>>> like 1 >>>> > >>>> > month, >>>> > >>>> > >> 1 >>>> > >>>> > >>>>> day >>>> > >>>> > >>>>> >>>> > >>>> > >>>>> Using IntervalUnit(YEAR_MONTH) can't represent the 1 day >>>> > >>>> > >>>>> Using IntervalUnit(DAY_TIME) can't represent the month as >>>> > >>>> different >>>> > >>>> > >> months >>>> > >>>> > >>>>> have different numbers of days >>>> > >>>> > >>>>> >>>> > >>>> > >>>>> [1] >>>> > >>>> > >>>>> >>>> > >>>> > >> >>>> > >>>> >>>> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L249-L260 >>>> > >>>> > >>>>> >>>> > >>>> > >>>>> >>>> > >>>> > >>>>> On Wed, Feb 17, 2021 at 5:01 PM Wes McKinney < >>>> > >>>> wesmck...@gmail.com> >>>> > >>>> > >> wrote: >>>> > >>>> > >>>>> >>>> > >>>> > >>>>>> On Wed, Feb 17, 2021 at 3:46 PM <t...@dmtry.me> wrote: >>>> > >>>> > >>>>>>> >>>> > >>>> > >>>>>>>> It's unclear to me that this needs to be introduced >>>> into the >>>> > >>>> > >>>>> top-level >>>> > >>>> > >>>>>>> >>>> > >>>> > >>>>>>> Similar thing to columnar format, How to store >>>> interval like 1 >>>> > >>>> > month >>>> > >>>> > >> 1 >>>> > >>>> > >>>>>> day 1 hour? It’s not possible to do it without >>>> converting 1 >>>> > >>>> month to >>>> > >>>> > >> 30 >>>> > >>>> > >>>>>> days, which is a bad way. >>>> > >>>> > >>>>>>> >>>> > >>>> > >>>>>> >>>> > >>>> > >>>>>> Presumably you can represent a complex interval in a >>>> fixed >>>> > >>>> number of >>>> > >>>> > >>>>>> bytes, and then embed the data in a FixedSizeBinary >>>> type. You >>>> > >>>> can >>>> > >>>> > >>>>>> adorn this type with extension type metadata so that >>>> > >>>> DataFusion can >>>> > >>>> > >>>>>> then apply Interval semantics to it. This could also >>>> serve as >>>> > >>>> an >>>> > >>>> > >>>>>> interim strategy for you to proceed with implementation >>>> while >>>> > >>>> > >>>>>> proposing a top-level type to the Arrow format (which >>>> may or >>>> > >>>> may not >>>> > >>>> > >>>>>> be accepting) so you aren't blocked on acceptance of >>>> changes >>>> > >>>> into >>>> > >>>> > >>>>>> Schema.fbs. >>>> > >>>> > >>>>>> >>>> > >>>> > >>>>>>>> On 17 Feb 2021, at 21:02, Wes McKinney < >>>> wesmck...@gmail.com> >>>> > >>>> > wrote: >>>> > >>>> > >>>>>>>> >>>> > >>>> > >>>>>>>> It's unclear to me that this needs to be introduced >>>> into the >>>> > >>>> > >>>>> top-level >>>> > >>>> > >>>>>>>> columnar format without more analysis — have you >>>> considered >>>> > >>>> > >>>>>>>> implementing this for DataFusion as an extension type >>>> for >>>> > >>>> the time >>>> > >>>> > >>>>>>>> being? >>>> > >>>> > >>>>>>>> >>>> > >>>> > >>>>>>>> On Wed, Feb 17, 2021 at 11:59 AM t...@dmtry.me >>>> <mailto: >>>> > >>>> > >> t...@dmtry.me >>>> > >>>> > >>>>>> >>>> > >>>> > >>>>>> <t...@dmtry.me <mailto:t...@dmtry.me>> wrote: >>>> > >>>> > >>>>>>>>> >>>> > >>>> > >>>>>>>>> Hi, >>>> > >>>> > >>>>>>>>> >>>> > >>>> > >>>>>>>>> For now, There are only two types of IntervalUnit >>>> inside >>>> > >>>> Arrow: >>>> > >>>> > >>>>>>>>> >>>> > >>>> > >>>>>>>>> - YearMonth - month stored as int32 >>>> > >>>> > >>>>>>>>> - DayTime - days as int32 and time in milliseconds >>>> as in32. >>>> > >>>> > Total >>>> > >>>> > >>>>>> (64 bites) >>>> > >>>> > >>>>>>>>> >>>> > >>>> > >>>>>>>>> Since DF is using Arrow, It’s not possible to store >>>> > >>>> “Complex” >>>> > >>>> > >>>>>> intervals such 1 MONTH 1 DAY 1 HOUR. >>>> > >>>> > >>>>>>>>> I think, the best way to understand the problem will >>>> be to >>>> > >>>> read a >>>> > >>>> > >>>>>> comment from DF codebase: >>>> > >>>> > >>>>>> >>>> > >>>> > >>>>> >>>> > >>>> > >> >>>> > >>>> > >>>> > >>>> >>>> https://github.com/apache/arrow/blob/bca7d2fe84ccd8fc1129cb4d85448eb0779c52c3/rust/datafusion/src/sql/planner.rs#L1148 >>>> > >>>> > >>>>>>>>> >>>> > >>>> > >>>>>>>>> // Interval is tricky thing >>>> > >>>> > >>>>>>>>> // 1 day is not 24 hours because timezones, >>>> 1 year >>>> > >>>> != >>>> > >>>> > >>>>> 365/364! >>>> > >>>> > >>>>>> 30 days != 1 month >>>> > >>>> > >>>>>>>>> // The true way to store and calculate >>>> intervals >>>> > >>>> is to >>>> > >>>> > >> store >>>> > >>>> > >>>>>> it as it defined >>>> > >>>> > >>>>>>>>> // Due the fact that Arrow supports only >>>> two types >>>> > >>>> > >> YearMonth >>>> > >>>> > >>>>>> (month) and DayTime (day, time) >>>> > >>>> > >>>>>>>>> // It's not possible to store complex >>>> intervals >>>> > >>>> > >>>>>>>>> // It's possible to do select (NOW() + >>>> INTERVAL '1 >>>> > >>>> > year') + >>>> > >>>> > >>>>>> INTERVAL '1 day'; as workaround >>>> > >>>> > >>>>>>>>> if result_month != 0 && (result_days != 0 || >>>> > >>>> > result_millis >>>> > >>>> > >> != >>>> > >>>> > >>>>>> 0) { >>>> > >>>> > >>>>>>>>> return >>>> > >>>> Err(DataFusionError::NotImplemented(format!( >>>> > >>>> > >>>>>>>>> "DF does not support intervals that >>>> have >>>> > >>>> both a >>>> > >>>> > >>>>>> Year/Month part as well as Days/Hours/Mins/Seconds: >>>> {:?}. >>>> > >>>> Hint: try >>>> > >>>> > >>>>>> breaking the interval into two parts, one with >>>> Year/Month and >>>> > >>>> the >>>> > >>>> > >> other >>>> > >>>> > >>>>>> with Days/Hours/Mins/Seconds - e.g. (NOW() + INTERVAL '1 >>>> > >>>> year') + >>>> > >>>> > >>>>> INTERVAL >>>> > >>>> > >>>>>> '1 day'", >>>> > >>>> > >>>>>>>>> value >>>> > >>>> > >>>>>>>>> ))); >>>> > >>>> > >>>>>>>>> } >>>> > >>>> > >>>>>>>>> >>>> > >>>> > >>>>>>>>> >>>> > >>>> > >>>>>>>>> >>>> > >>>> > >>>>>>>>> I prepared a PR >>>> > >>>> https://github.com/apache/arrow/pull/9516/files >>>> > >>>> > < >>>> > >>>> > >>>>>> https://github.com/apache/arrow/pull/9516/files> < >>>> > >>>> > >>>>>> https://github.com/apache/arrow/pull/9516/files < >>>> > >>>> > >>>>>> https://github.com/apache/arrow/pull/9516/files>> that >>>> > >>>> introduce a >>>> > >>>> > >> new >>>> > >>>> > >>>>>> type for IntervalUnit called Complex, that store both >>>> > >>>> YearMonth and >>>> > >>>> > >>>>> DayTime >>>> > >>>> > >>>>>> to support complex interval. >>>> > >>>> > >>>>>>>>> I didn’t find any page/documentation on how to do >>>> RFC in >>>> > >>>> Arrow >>>> > >>>> > >>>>>> protocol, so can anyone point me to it or PR with email >>>> will be >>>> > >>>> > >> enough? >>>> > >>>> > >>>>>>>>> >>>> > >>>> > >>>>>>>>> Thanks. >>>> > >>>> > >>>>>>> >>>> > >>>> > >>>>>> >>>> > >>>> > >>>>> >>>> > >>>> > >>>> >>>> > >>>> > >> >>>> > >>>> > > >>>> > >>>> > >>>> > >>>> >>>> > >>> >>>> >>>