Hey all,
I'm currently running some UX testing for a prototype DB engine integrating
DataFusion, and one recurring point that crops up is that specifying literal
timestamps, e.g. as gt/lt predicates in a where clause, is a bit awkward right
now. Most of the testing is borrowing existing queries developed for DBs like
Postgres, Redshift, and Snowflake. One example of a query that can't be run on
DataFusion now:
> select * from some_table where timestamp_col >= '2021-01-01 00:00:00'
...which to my understanding, in most other DBs, is semantically equivalent to:
> select * from some_table where timestamp_col >= cast('2021-01-01 00:00:00' as
> timestamp)
...as said DBs use target-typing for the string literal scalar. In DataFusion
currently, the former approach will generate an error along the lines of
"'Timestamp(Nanosecond, None) >= Utf8' can't be evaluated because there isn't a
common type to coerce the types to" which makes sense. The latter approach also
doesn't work, with an error of "Unsupported CAST from Utf8 to
Timestamp(Nanosecond, None)", so I'm currently using DataFusion's unary
to_timestamp function as a workaround.
My current (but relatively uninformed!) thinking is that we could add this
transformation to the logical plan phase, and use available schema info to
coerce string values to the correct type where required. However, the SQL AST
=> logical plan step isn't one that I'm super familiar with, so if anyone has
better suggestions of where to start, that'd be much appreciated. Otherwise, if
there's broad approval for this approach, I can file this in JIRA and start
work on it.
Cheers,
Ruan