Thanks Weston for the insight - for the short term we are going to try to
unify the time unit to "microseconds" to be compatible with substrait and
pay the cost of converting to nanoseconds (e.g., when passing to pandas)
when needed.

Longer term I think option (3) is probably the most practical (although,
perhaps not worthwhile if the paying performance cost at
microseconds/nanoseconds convention isn't too bad in practice)



On Thu, Mar 9, 2023 at 1:36 PM Weston Pace <weston.p...@gmail.com> wrote:

> The Substrait decision for microseconds was made because, at the time, the
> goal was to keep the type system simple and universal, and there were
> systems that didn't support ns (e.g. Iceberg, postgres, duckdb, velox).
>
> A few options (off the top of my head):
>
>  1. Attempt to get a nanoseconds timestamp type adopted in Substrait.
>
> I'm not sure how much enthusiasm there will be for this.  I think Acero is
> the only consumer that would take advantage of this.  Perhaps Ibis or
> Datafusion would have some interest.  It would require changing an old
> Substrait agreement around the rules for which data types to use.
>
> 2. Treat timestamp(ns) as a variation of timestamp(us).
>
> I'm listing this for thoroughness however I don't think we can do this.
> Substrate requires timestamps to be able to go out to the year 9999 and a
> 64-bit nanoseconds from the epoch cannot do this.
>
> 3. Treat timestamp(ns) as a user-defined type (from Substrait's
> perspective)
>
> This is probably the easiest approach in terms of consensus-building.  The
> Substrait consumer should already have the plumbing for this in
> src/arrow/engine/substrait/extension_types.h  I think getting Acero to work
> here will be pretty easy.  The trickier part might be adapting your
> producer (Ibis?)
>
> On Thu, Mar 9, 2023 at 9:43 AM Li Jin <ice.xell...@gmail.com> wrote:
>
> > Hi,
> >
> > I recently came across some limitations in expressing timestamp type with
> > Substrait in the Acero substrait consumer and am curious to hear what
> > people's thoughts are.
> >
> > The particular issue that I have is when specifying timestamp type in
> > substrait, the unit is "microseconds" and there is no way to change that.
> > When integrating with Arrow, often we have timestamps in an internal
> system
> > that is of another unit, e.g., a flight service that returns a timestamp
> in
> > nanos. Also, interop with pandas, because pandas internally use
> > nanoseconds, that is another gap.
> >
> > Currently as a result, we often need to convert from nanos <-> micro
> when a
> > substrait plan is involved to specify timestamps. It feels to me as
> > something missing in substrait but I wonder what other people think.
> >
> > (Sending this to Arrow mailing list because I know some people here are
> > pretty involved with substrait and I am more familiar with the folks in
> the
> > Arrow community. Therefore wanted to get some thoughts from the people
> > here).
> >
> > Li
> >
>

Reply via email to