On Wed, 2021-06-02 at 13:56 -0700, Micah Kornfield wrote: > > > > Any SQL interface to Arrow should follow the SQL standard. So, for > > instance, if a column has TIMESTAMP type, it should behave as a > > date-time without a time-zone. > > > At least in bigquery we do the following mapping: > SQL TIMESTAMP -> Arrow Timestamp with "UTC" timezone > SQL DATETIME -> Arrow Timestamp without a time-zone.
It's likely that the Arrow Date32/Date64 type has a more interesting definition that it represents "elapsed time since UNIX epoch". Does that mean it's recommended to a user/developer to use Arrow Date64 rather than Arrow Timestamp to indicate an absolute point in time? If so, based on which consideration Bigquery didn't choose Arrow Date64 for its TIMESTAMP type? As represented in the SQL standard a "DATE" type should be as trivial as a "TIMESTAMP WITHOUT TIMEZONE" despite the time-in-day part. If "UNIX epoch" implies "UTC"/"GMT" undebatably, for example, Arrow Date32 will not be able to perfectly align with the SQL standard "DATE" which is supposed to be localized. However if I am not wrong some of the projects using Arrow already made the Date<->Date32 equivalence such as Spark SQL[1][2]. I am not sure if it's a recommended practice but in Arrow using the term "Date" for "UNIX epoch" does seem to be a bit of counterintuitive to me. Best, Hongze [1] https://github.com/apache/spark/blob/2658bc590fec51e2266d03121c85b47f553022ec/sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala#L48 [2] https://github.com/apache/spark/blob/2658bc590fec51e2266d03121c85b47f553022ec/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java#L417-L430