OK, I think I have completed the initial changes for the new interval type in https://github.com/apache/arrow/pull/10177
The code changes still need to be reviewed, but I don't think that should stop a vote. I'll start a vote on Monday unless there are more comments on the format changes. Thanks, Micah On Wed, Aug 11, 2021 at 1:38 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > As an update, I've gotten basic integration testing working in Java and > C++ along with the format proposal updates [1]. > > I have a little bit more work to do on the initial implementations (make > CI happy, add unit tests in Java) but I think this is getting close to the > point that we can vote on it. For those interested, please peruse the > implementations and leave any comments. > > I'm hoping to wrap up the CI and Java test sometime tomorrow and if > reviewers for the implementations have bandwidth hopefully address any > concerns and start a vote sometime next week. > > I plan on adding integration with Python/Pandas bindings in follow-up PRs > but likely won't have bandwidth for much more work here. > > > [1] https://github.com/apache/arrow/pull/10177 > > On Thu, May 6, 2021 at 9:05 AM Wes McKinney <wesmck...@gmail.com> wrote: > >> Ah, that makes sense to wait then. >> >> On Thu, May 6, 2021 at 10:55 AM Micah Kornfield <emkornfi...@gmail.com> >> wrote: >> > >> > I'll address the feedback. I think in the past we've waited for >> implementations in java and c++ with integration tests before formally >> voting. If there is no more feedback I can start looking at >> implementations (happy to have help) >> > >> > On Thursday, May 6, 2021, Wes McKinney <wesmck...@gmail.com> wrote: >> >> >> >> The PR looks good. I just left some comments about typos. I would say >> >> it's probably about time to call a vote. Anywhere else where we should >> >> be soliciting feedback? >> >> >> >> On Mon, May 3, 2021 at 2:17 PM Jacek Pliszka <jacek.plis...@gmail.com> >> wrote: >> >> > >> >> > Good idea, I've created JIRA issue: >> >> > >> >> > https://issues.apache.org/jira/browse/ARROW-12637 >> >> > >> >> > And named it range to avoid confusion with intervals... >> >> > Though confusion will stay as it is called interval in Pandas and in >> >> > logic (Allen's interval algebra) >> >> > >> >> > BR, >> >> > >> >> > Jacek >> >> > >> >> > pon., 3 maj 2021 o 18:05 Micah Kornfield <emkornfi...@gmail.com> >> napisał(a): >> >> > > >> >> > > Hi Jacek, >> >> > > This seems like reasonable functionality. I think the probably >> comes in >> >> > > two parts: >> >> > > 1. This might be a good candidate for a "Well Known"/Officially >> supported >> >> > > Extension type. I can think of a few different representations but >> I would >> >> > > guess something like Struct[start: T, struct: end]] with well >> defined >> >> > > extension metadata to define open/closed on start and end might be >> the best >> >> > > (we should probably spin this off into a separate discussion >> thread). >> >> > > 2. Adding the right computation Kernels to work with the type. >> >> > > >> >> > > Do you want to start a new thread or open up some JIRAs to track >> this work? >> >> > > >> >> > > Thanks, >> >> > > Micah >> >> > > >> >> > > On Mon, May 3, 2021 at 5:32 AM Jacek Pliszka < >> jacek.plis...@gmail.com> >> >> > > wrote: >> >> > > >> >> > > > Sorry, my mistake. >> >> > > > >> >> > > > You are right - I meant anchored intervals as in pandas - ones >> with >> >> > > > defined start and end - and I think many future users will make >> the >> >> > > > same mistake. >> >> > > > >> >> > > > I would love to be able to do fast overlap joins on arrow level. >> >> > > > >> >> > > > Best Regards, >> >> > > > >> >> > > > Jacek >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > niedz., 2 maj 2021 o 23:06 Wes McKinney <wesmck...@gmail.com> >> napisał(a): >> >> > > > > >> >> > > > > I also don't understand the comment about closed / open / >> semi-open >> >> > > > > intervals. Perhaps there is a confusion, since "interval" as >> we mean >> >> > > > > it here is called a "time delta" in some other projects. An >> interval >> >> > > > > here does not refer to a time span with a distinct start and >> end point >> >> > > > > (I understand this might be confusing to a pandas user since >> pandas >> >> > > > > has an interval data type where each value is a tuple of >> arbitrary >> >> > > > > start/end). >> >> > > > > >> >> > > > > On Sun, May 2, 2021 at 3:46 PM Micah Kornfield < >> emkornfi...@gmail.com> >> >> > > > wrote: >> >> > > > > > >> >> > > > > > Hi Jacek, >> >> > > > > > I'm not sure I fully understand the proposal, could you >> elaborate with >> >> > > > more >> >> > > > > > examples/details? For instance DAY_TIME isn't just a >> UINT64, it >> >> > > > actually >> >> > > > > > contains 2 seperate fields (days and milliseconds). >> >> > > > > > >> >> > > > > > In terms of closed vs half-open, in my limited >> understanding, that is >> >> > > > more >> >> > > > > > a concern of functions using interval types rather than the >> type >> >> > > > itself. >> >> > > > > > For instance a quick search of postgres [1] docs only talks >> about >> >> > > > half-open >> >> > > > > > in relation to the "Overlaps" operator >> >> > > > > > >> >> > > > > > Thanks, >> >> > > > > > -Micah >> >> > > > > > >> >> > > > > > [1] >> https://www.postgresql.org/docs/9.1/functions-datetime.html >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > On Sun, May 2, 2021 at 12:25 AM Jacek Pliszka < >> jacek.plis...@gmail.com >> >> > > > > >> >> > > > > > wrote: >> >> > > > > > >> >> > > > > > > Hi! >> >> > > > > > > >> >> > > > > > > I wonder if it were possible to have generic interval with >> integers >> >> > > > of >> >> > > > > > > specified size just to have common base for interval >> arithmetic. >> >> > > > > > > >> >> > > > > > > Then user can convert their period to ordinals and use the >> arithmetic >> >> > > > > > > (joining, deoverlapping, common parts, explosion etc.). >> >> > > > > > > >> >> > > > > > > So YEAR_MONTH and DAY_TIME would be just special cases of >> >> > > > > > > INTERVAL_UINT32 and INTERVAL_UINT64 >> >> > > > > > > >> >> > > > > > > Also I believe it is worth to state whether there are only >> closed >> >> > > > > > > intervals or open/semi-open ones are allowed as well. >> >> > > > > > > >> >> > > > > > > I believe I am just one of many reinventing the wheel here >> and >> >> > > > writing >> >> > > > > > > own versions of the above. >> >> > > > > > > >> >> > > > > > > BR, >> >> > > > > > > >> >> > > > > > > Jacek >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > pt., 2 kwi 2021 o 21:53 Micah Kornfield < >> emkornfi...@gmail.com> >> >> > > > > > > napisał(a): >> >> > > > > > > > >> >> > > > > > > > Andrew is the use-case you have simply postgres >> compatibility or >> >> > > > is it >> >> > > > > > > more >> >> > > > > > > > extensive? >> >> > > > > > > > >> >> > > > > > > > One potential problem with combining Month and Day >> fields, is that >> >> > > > the >> >> > > > > > > type >> >> > > > > > > > no longer has a defined sort order (the existing >> Day-Millisecond >> >> > > > type >> >> > > > > > > > without assumptions, in particular because I don't think >> today >> >> > > > there is >> >> > > > > > > an >> >> > > > > > > > explicit constraint on the bounds for the millisecond >> component). >> >> > > > > > > > >> >> > > > > > > > -Micah >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > On Wed, Mar 31, 2021 at 9:03 AM Antoine Pitrou < >> anto...@python.org >> >> > > > > >> >> > > > > > > wrote: >> >> > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > Le 31/03/2021 à 17:55, Micah Kornfield a écrit : >> >> > > > > > > > > > Thanks for the feedback. A couple of points here >> and some >> >> > > > responses >> >> > > > > > > > > below. >> >> > > > > > > > > > >> >> > > > > > > > > > * One other question is whether the Nanoseconds >> should >> >> > > > actually be >> >> > > > > > > > > > configurable (i.e. use milliseconds or >> microseconds). I would >> >> > > > lean >> >> > > > > > > > > towards >> >> > > > > > > > > > no. >> >> > > > > > > > > >> >> > > > > > > > > Same for me. >> >> > > > > > > > > >> >> > > > > > > > > > * I'm also still not 100% convinced we need this as >> a first >> >> > > > class >> >> > > > > > > type in >> >> > > > > > > > > > arrow or if we should be looking more closely at the >> Struct >> >> > > > (in the >> >> > > > > > > Arrow >> >> > > > > > > > > > sense) based implementation. In the future where >> alternative >> >> > > > > > > encodings >> >> > > > > > > > > are >> >> > > > > > > > > > supported, this could allow for much smaller >> footprints for >> >> > > > this >> >> > > > > > > type. >> >> > > > > > > > > >> >> > > > > > > > > Having a "packed" first class type allows for better >> locality >> >> > > > when >> >> > > > > > > > > accessing data. It doesn't sound very likely that >> you'd access >> >> > > > only >> >> > > > > > > one >> >> > > > > > > > > component of the interval. >> >> > > > > > > > > >> >> > > > > > > > > But I have no idea how important this is, and temporal >> datetypes >> >> > > > are >> >> > > > > > > > > generally cumbersome to add support for (conversions, >> arithmetic, >> >> > > > > > > etc.), >> >> > > > > > > > > so it would be nice to avoid adding too many of them >> :-) >> >> > > > > > > > > >> >> > > > > > > > > Regards >> >> > > > > > > > > >> >> > > > > > > > > Antoine. >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > The 3 >> >> > > > > > > > > >> field implementation doesn't seem to have any way >> to represent >> >> > > > > > > integral >> >> > > > > > > > > >> days, so I am also not sure about that one. >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > Sorry this was an email gaffe. I intended Month (32 >> bit int), >> >> > > > Day >> >> > > > > > > (32 >> >> > > > > > > > > bit >> >> > > > > > > > > > int), Nanosecond (64 bit int). >> >> > > > > > > > > > >> >> > > > > > > > > > OTOH I don't really understand the point of >> supporting "the >> >> > > > most >> >> > > > > > > > > >> reasonable ranges for Year, Month and Nanoseconds >> >> > > > independently". >> >> > > > > > > What >> >> > > > > > > > > >> does it bring to encode more than one month in the >> nanoseconds >> >> > > > > > > field? >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > I'm happy with simplicity. In the past there has >> been some >> >> > > > > > > reference to >> >> > > > > > > > > > people wanting to store very large timestamps (fall >> out of >> >> > > > > > > Nanoseconds >> >> > > > > > > > > max >> >> > > > > > > > > > representable value) but we've concluded that this >> wasn't >> >> > > > something >> >> > > > > > > that >> >> > > > > > > > > we >> >> > > > > > > > > > wanted to really support. >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > On Wed, Mar 31, 2021 at 4:49 AM Antoine Pitrou < >> >> > > > anto...@python.org> >> >> > > > > > > > > wrote: >> >> > > > > > > > > > >> >> > > > > > > > > >> >> >> > > > > > > > > >> I would favour the following characteristics : >> >> > > > > > > > > >> - support for nanoseconds (especially as other >> Arrow temporal >> >> > > > types >> >> > > > > > > > > >> support it) >> >> > > > > > > > > >> - easy to handle (which excludes the ZetaSQL >> representtaion >> >> > > > IMHO) >> >> > > > > > > > > >> >> >> > > > > > > > > >> OTOH I don't really understand the point of >> supporting "the >> >> > > > most >> >> > > > > > > > > >> reasonable ranges for Year, Month and Nanoseconds >> >> > > > independently". >> >> > > > > > > What >> >> > > > > > > > > >> does it bring to encode more than one month in the >> nanoseconds >> >> > > > > > > field? >> >> > > > > > > > > >> You can already use the Duration type for that. >> >> > > > > > > > > >> >> >> > > > > > > > > >> Regards >> >> > > > > > > > > >> >> >> > > > > > > > > >> Antoine. >> >> > > > > > > > > >> >> >> > > > > > > > > >> >> >> > > > > > > > > >> Le 31/03/2021 à 05:48, Micah Kornfield a écrit : >> >> > > > > > > > > >>> To follow-up on this conversation I did some >> analysis on >> >> > > > interval >> >> > > > > > > > > types: >> >> > > > > > > > > >>> >> >> > > > > > > > > >>> >> >> > > > > > > > > >> >> >> > > > > > > > > >> >> > > > > > > >> >> > > > >> https://docs.google.com/document/d/1i1E_fdQ_xODZcAhsV11Pfq27O50k679OYHXFJpm9NS0/edit >> >> > > > > > > > > >> Please feel free to add more details/systems I >> missed. >> >> > > > > > > > > >>> >> >> > > > > > > > > >>> Given the disparate requirements of different >> systems I >> >> > > > think the >> >> > > > > > > > > >> following might make sense for official types (if >> there isn't >> >> > > > > > > > > consensus, I >> >> > > > > > > > > >> might try to contributation extension Array >> implementations >> >> > > > for >> >> > > > > > > them to >> >> > > > > > > > > >> Java and C++/Python separately). >> >> > > > > > > > > >>> >> >> > > > > > > > > >>> 1. 3 fields: Year (32 bit), Month (32 bit), >> Nanoseconds (64 >> >> > > > bit) >> >> > > > > > > all >> >> > > > > > > > > >> signed. >> >> > > > > > > > > >>> 2. Postgres representation (Downside is it >> doesn't support >> >> > > > > > > > > Nanoseconds, >> >> > > > > > > > > >> only microseconds). >> >> > > > > > > > > >>> 3. ZetaSQL implementation (Requires some bit >> manipulation) >> >> > > > but >> >> > > > > > > > > supports >> >> > > > > > > > > >> the most reasonable ranges for Year, Month and >> Nanoseconds >> >> > > > > > > > > independently. >> >> > > > > > > > > >>> >> >> > > > > > > > > >>> Thoughts? >> >> > > > > > > > > >>> >> >> > > > > > > > > >>> Micah >> >> > > > > > > > > >>> >> >> > > > > > > > > >>> On 2021/02/18 04:30:55 Micah Kornfield wrote: >> >> > > > > > > > > >>>>> >> >> > > > > > > > > >>>>> I didn’t find any page/documentation on how to >> do RFC in >> >> > > > Arrow >> >> > > > > > > > > >> protocol, >> >> > > > > > > > > >>>>> so can anyone point me to it or PR with email >> will be >> >> > > > enough? >> >> > > > > > > > > >>>> >> >> > > > > > > > > >>>> That is enough to start discussion. Before formal >> >> > > > acceptance and >> >> > > > > > > > > >> merging >> >> > > > > > > > > >>>> of the PR there needs to be a Java and C++ >> implementations >> >> > > > for the >> >> > > > > > > > > type >> >> > > > > > > > > >>>> that pass integration tests. At the time this >> guideline was >> >> > > > > > > > > instituted >> >> > > > > > > > > >>>> Java and C++ were considered the "reference" >> >> > > > implementations (I >> >> > > > > > > think >> >> > > > > > > > > >> they >> >> > > > > > > > > >>>> still have the most complete integration test >> coverage). >> >> > > > > > > > > >>>> >> >> > > > > > > > > >>>> My understanding is that the current modelling of >> intervals >> >> > > > > > > mimics SQL >> >> > > > > > > > > >>>> standards (e.g. SQL Server [1]). So it would >> also be good >> >> > > > to step >> >> > > > > > > > > back >> >> > > > > > > > > >> and >> >> > > > > > > > > >>>> understand what problem DF is trying to solve and >> how it >> >> > > > differs >> >> > > > > > > from >> >> > > > > > > > > >> other >> >> > > > > > > > > >>>> SQL implementations. I'd be hesitant to accept >> COMPLEX as >> >> > > > a new >> >> > > > > > > type >> >> > > > > > > > > >>>> without a much deeper analysis into calendar >> representations >> >> > > > > > > within >> >> > > > > > > > > >> Arrow >> >> > > > > > > > > >>>> and how they relate to other existing systems >> (e.g. Hive >> >> > > > and some >> >> > > > > > > > > >>>> assortment of existing SQL databases). For >> instance the >> >> > > > current >> >> > > > > > > > > >> modelling >> >> > > > > > > > > >>>> of timestamps does not lend itself to >> constructing a COMPLEX >> >> > > > > > > interval >> >> > > > > > > > > >> type >> >> > > > > > > > > >>>> particularly well. (Duration was introduced for >> this >> >> > > > reason). >> >> > > > > > > > > >>>> >> >> > > > > > > > > >>>> I think both Wes's suggestion of FixedSizeBinary >> and >> >> > > > Andrew's of >> >> > > > > > > > > >> composing >> >> > > > > > > > > >>>> the with a struct are good stop-gaps. These >> obviously have >> >> > > > > > > different >> >> > > > > > > > > >>>> trade-offs. Ultimately, it would be good to >> define common >> >> > > > > > > extension >> >> > > > > > > > > >> types >> >> > > > > > > > > >>>> that can represent this use-case if there really >> is demand >> >> > > > for it >> >> > > > > > > (if >> >> > > > > > > > > it >> >> > > > > > > > > >>>> doesn't become a top level type). >> >> > > > > > > > > >>>> >> >> > > > > > > > > >>>> [1] >> >> > > > > > > > > >>>> >> >> > > > > > > > > >> >> >> > > > > > > > > >> >> > > > > > > >> >> > > > >> https://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/interval-data-types?view=sql-server-ver15 >> >> > > > > > > > > >>>> >> >> > > > > > > > > >>>> -Micah >> >> > > > > > > > > >>>> >> >> > > > > > > > > >>>> On Wed, Feb 17, 2021 at 2:05 PM Andrew Lamb < >> >> > > > al...@influxdata.com >> >> > > > > > > > >> >> > > > > > > > > >> wrote: >> >> > > > > > > > > >>>> >> >> > > > > > > > > >>>>> That is a great suggestion Wes, thank you. >> >> > > > > > > > > >>>>> >> >> > > > > > > > > >>>>> I wonder if we could get away with a 128 bit >> >> > > > representation that >> >> > > > > > > is >> >> > > > > > > > > the >> >> > > > > > > > > >>>>> concatenation of the two existing interval types >> >> > > > > > > > > (YearMonth)(DayTime). >> >> > > > > > > > > >> Or >> >> > > > > > > > > >>>>> maybe even define a `struct` type with those >> fields that >> >> > > > is used >> >> > > > > > > by >> >> > > > > > > > > >>>>> DataFusion. >> >> > > > > > > > > >>>>> >> >> > > > > > > > > >>>>> Basically, given our reading of the Arrow >> spec[1], it is >> >> > > > > > > currently >> >> > > > > > > > > not >> >> > > > > > > > > >>>>> possible to precisely represent an interval that >> has both >> >> > > > > > > monthly and >> >> > > > > > > > > >>>>> sub-montly granularity. >> >> > > > > > > > > >>>>> >> >> > > > > > > > > >>>>> As Dmtry says, if you have an interval seemingly >> simple >> >> > > > like 1 >> >> > > > > > > > > month, >> >> > > > > > > > > >> 1 >> >> > > > > > > > > >>>>> day >> >> > > > > > > > > >>>>> >> >> > > > > > > > > >>>>> Using IntervalUnit(YEAR_MONTH) can't represent >> the 1 day >> >> > > > > > > > > >>>>> Using IntervalUnit(DAY_TIME) can't represent the >> month as >> >> > > > > > > different >> >> > > > > > > > > >> months >> >> > > > > > > > > >>>>> have different numbers of days >> >> > > > > > > > > >>>>> >> >> > > > > > > > > >>>>> [1] >> >> > > > > > > > > >>>>> >> >> > > > > > > > > >> >> >> > > > > > > >> >> > > > >> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L249-L260 >> >> > > > > > > > > >>>>> >> >> > > > > > > > > >>>>> >> >> > > > > > > > > >>>>> On Wed, Feb 17, 2021 at 5:01 PM Wes McKinney < >> >> > > > > > > wesmck...@gmail.com> >> >> > > > > > > > > >> wrote: >> >> > > > > > > > > >>>>> >> >> > > > > > > > > >>>>>> On Wed, Feb 17, 2021 at 3:46 PM <t...@dmtry.me> >> wrote: >> >> > > > > > > > > >>>>>>> >> >> > > > > > > > > >>>>>>>> It's unclear to me that this needs to be >> introduced >> >> > > > into the >> >> > > > > > > > > >>>>> top-level >> >> > > > > > > > > >>>>>>> >> >> > > > > > > > > >>>>>>> Similar thing to columnar format, How to store >> interval >> >> > > > like 1 >> >> > > > > > > > > month >> >> > > > > > > > > >> 1 >> >> > > > > > > > > >>>>>> day 1 hour? It’s not possible to do it without >> converting >> >> > > > 1 >> >> > > > > > > month to >> >> > > > > > > > > >> 30 >> >> > > > > > > > > >>>>>> days, which is a bad way. >> >> > > > > > > > > >>>>>>> >> >> > > > > > > > > >>>>>> >> >> > > > > > > > > >>>>>> Presumably you can represent a complex interval >> in a fixed >> >> > > > > > > number of >> >> > > > > > > > > >>>>>> bytes, and then embed the data in a >> FixedSizeBinary type. >> >> > > > You >> >> > > > > > > can >> >> > > > > > > > > >>>>>> adorn this type with extension type metadata so >> that >> >> > > > DataFusion >> >> > > > > > > can >> >> > > > > > > > > >>>>>> then apply Interval semantics to it. This could >> also >> >> > > > serve as an >> >> > > > > > > > > >>>>>> interim strategy for you to proceed with >> implementation >> >> > > > while >> >> > > > > > > > > >>>>>> proposing a top-level type to the Arrow format >> (which may >> >> > > > or >> >> > > > > > > may not >> >> > > > > > > > > >>>>>> be accepting) so you aren't blocked on >> acceptance of >> >> > > > changes >> >> > > > > > > into >> >> > > > > > > > > >>>>>> Schema.fbs. >> >> > > > > > > > > >>>>>> >> >> > > > > > > > > >>>>>>>> On 17 Feb 2021, at 21:02, Wes McKinney < >> >> > > > wesmck...@gmail.com> >> >> > > > > > > > > wrote: >> >> > > > > > > > > >>>>>>>> >> >> > > > > > > > > >>>>>>>> It's unclear to me that this needs to be >> introduced >> >> > > > into the >> >> > > > > > > > > >>>>> top-level >> >> > > > > > > > > >>>>>>>> columnar format without more analysis — have >> you >> >> > > > considered >> >> > > > > > > > > >>>>>>>> implementing this for DataFusion as an >> extension type >> >> > > > for the >> >> > > > > > > time >> >> > > > > > > > > >>>>>>>> being? >> >> > > > > > > > > >>>>>>>> >> >> > > > > > > > > >>>>>>>> On Wed, Feb 17, 2021 at 11:59 AM >> t...@dmtry.me <mailto: >> >> > > > > > > > > >> t...@dmtry.me >> >> > > > > > > > > >>>>>> >> >> > > > > > > > > >>>>>> <t...@dmtry.me <mailto:t...@dmtry.me>> wrote: >> >> > > > > > > > > >>>>>>>>> >> >> > > > > > > > > >>>>>>>>> Hi, >> >> > > > > > > > > >>>>>>>>> >> >> > > > > > > > > >>>>>>>>> For now, There are only two types of >> IntervalUnit >> >> > > > inside >> >> > > > > > > Arrow: >> >> > > > > > > > > >>>>>>>>> >> >> > > > > > > > > >>>>>>>>> - YearMonth - month stored as int32 >> >> > > > > > > > > >>>>>>>>> - DayTime - days as int32 and time in >> milliseconds as >> >> > > > in32. >> >> > > > > > > > > Total >> >> > > > > > > > > >>>>>> (64 bites) >> >> > > > > > > > > >>>>>>>>> >> >> > > > > > > > > >>>>>>>>> Since DF is using Arrow, It’s not possible >> to store >> >> > > > “Complex” >> >> > > > > > > > > >>>>>> intervals such 1 MONTH 1 DAY 1 HOUR. >> >> > > > > > > > > >>>>>>>>> I think, the best way to understand the >> problem will >> >> > > > be to >> >> > > > > > > read a >> >> > > > > > > > > >>>>>> comment from DF codebase: >> >> > > > > > > > > >>>>>> >> >> > > > > > > > > >>>>> >> >> > > > > > > > > >> >> >> > > > > > > > > >> >> > > > > > > >> >> > > > >> https://github.com/apache/arrow/blob/bca7d2fe84ccd8fc1129cb4d85448eb0779c52c3/rust/datafusion/src/sql/planner.rs#L1148 >> >> > > > > > > > > >>>>>>>>> >> >> > > > > > > > > >>>>>>>>> // Interval is tricky thing >> >> > > > > > > > > >>>>>>>>> // 1 day is not 24 hours because >> timezones, 1 >> >> > > > year >> >> > > > > > > != >> >> > > > > > > > > >>>>> 365/364! >> >> > > > > > > > > >>>>>> 30 days != 1 month >> >> > > > > > > > > >>>>>>>>> // The true way to store and >> calculate >> >> > > > intervals is >> >> > > > > > > to >> >> > > > > > > > > >> store >> >> > > > > > > > > >>>>>> it as it defined >> >> > > > > > > > > >>>>>>>>> // Due the fact that Arrow supports >> only two >> >> > > > types >> >> > > > > > > > > >> YearMonth >> >> > > > > > > > > >>>>>> (month) and DayTime (day, time) >> >> > > > > > > > > >>>>>>>>> // It's not possible to store >> complex >> >> > > > intervals >> >> > > > > > > > > >>>>>>>>> // It's possible to do select >> (NOW() + >> >> > > > INTERVAL '1 >> >> > > > > > > > > year') + >> >> > > > > > > > > >>>>>> INTERVAL '1 day'; as workaround >> >> > > > > > > > > >>>>>>>>> if result_month != 0 && >> (result_days != 0 || >> >> > > > > > > > > result_millis >> >> > > > > > > > > >> != >> >> > > > > > > > > >>>>>> 0) { >> >> > > > > > > > > >>>>>>>>> return >> >> > > > > > > Err(DataFusionError::NotImplemented(format!( >> >> > > > > > > > > >>>>>>>>> "DF does not support >> intervals that >> >> > > > have >> >> > > > > > > both a >> >> > > > > > > > > >>>>>> Year/Month part as well as >> Days/Hours/Mins/Seconds: {:?}. >> >> > > > Hint: >> >> > > > > > > try >> >> > > > > > > > > >>>>>> breaking the interval into two parts, one with >> Year/Month >> >> > > > and >> >> > > > > > > the >> >> > > > > > > > > >> other >> >> > > > > > > > > >>>>>> with Days/Hours/Mins/Seconds - e.g. (NOW() + >> INTERVAL '1 >> >> > > > year') >> >> > > > > > > + >> >> > > > > > > > > >>>>> INTERVAL >> >> > > > > > > > > >>>>>> '1 day'", >> >> > > > > > > > > >>>>>>>>> value >> >> > > > > > > > > >>>>>>>>> ))); >> >> > > > > > > > > >>>>>>>>> } >> >> > > > > > > > > >>>>>>>>> >> >> > > > > > > > > >>>>>>>>> >> >> > > > > > > > > >>>>>>>>> >> >> > > > > > > > > >>>>>>>>> I prepared a PR >> >> > > > > > > https://github.com/apache/arrow/pull/9516/files >> >> > > > > > > > > < >> >> > > > > > > > > >>>>>> https://github.com/apache/arrow/pull/9516/files> >> < >> >> > > > > > > > > >>>>>> https://github.com/apache/arrow/pull/9516/files >> < >> >> > > > > > > > > >>>>>> https://github.com/apache/arrow/pull/9516/files>> >> that >> >> > > > > > > introduce a >> >> > > > > > > > > >> new >> >> > > > > > > > > >>>>>> type for IntervalUnit called Complex, that >> store both >> >> > > > YearMonth >> >> > > > > > > and >> >> > > > > > > > > >>>>> DayTime >> >> > > > > > > > > >>>>>> to support complex interval. >> >> > > > > > > > > >>>>>>>>> I didn’t find any page/documentation on how >> to do RFC >> >> > > > in >> >> > > > > > > Arrow >> >> > > > > > > > > >>>>>> protocol, so can anyone point me to it or PR >> with email >> >> > > > will be >> >> > > > > > > > > >> enough? >> >> > > > > > > > > >>>>>>>>> >> >> > > > > > > > > >>>>>>>>> Thanks. >> >> > > > > > > > > >>>>>>> >> >> > > > > > > > > >>>>>> >> >> > > > > > > > > >>>>> >> >> > > > > > > > > >>>> >> >> > > > > > > > > >> >> >> > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > >> >> > > > >> >