On Wed, 2021-06-02 at 13:56 -0700, Micah Kornfield wrote:
> > 
> > Any SQL interface to Arrow should follow the SQL standard. So, for
> > instance, if a column has TIMESTAMP type, it should behave as a
> > date-time without a time-zone.
> 
> 
> At least in bigquery we do the following mapping:
> SQL TIMESTAMP -> Arrow Timestamp with "UTC" timezone
> SQL DATETIME -> Arrow Timestamp without a time-zone.

It's likely that the Arrow Date32/Date64 type has a more interesting
definition that it represents "elapsed time since UNIX epoch". Does
that mean it's recommended to a user/developer to use Arrow Date64
rather than Arrow Timestamp to indicate an absolute point in time? If
so, based on which consideration Bigquery didn't choose Arrow Date64
for its TIMESTAMP type?

As represented in the SQL standard a "DATE" type should be as trivial
as a "TIMESTAMP WITHOUT TIMEZONE" despite the time-in-day part. If
"UNIX epoch" implies "UTC"/"GMT" undebatably, for example, Arrow Date32
will not be able to perfectly align with the SQL standard "DATE" which
is supposed to be localized. However if I am not wrong some of the
projects using Arrow already made the Date<->Date32 equivalence such as
Spark SQL[1][2]. I am not sure if it's a recommended practice but in
Arrow using the term "Date" for "UNIX epoch" does seem to be a bit of
counterintuitive to me.

Best,
Hongze

[1]
https://github.com/apache/spark/blob/2658bc590fec51e2266d03121c85b47f553022ec/sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala#L48
[2]
https://github.com/apache/spark/blob/2658bc590fec51e2266d03121c85b47f553022ec/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java#L417-L430

Reply via email to