Hey all,

I'm currently running some UX testing for a prototype DB engine integrating 
DataFusion, and one recurring point that crops up is that specifying literal 
timestamps, e.g. as gt/lt predicates in a where clause, is a bit awkward right 
now. Most of the testing is borrowing existing queries developed for DBs like 
Postgres, Redshift, and Snowflake. One example of a query that can't be run on 
DataFusion now:
> select * from some_table where timestamp_col >= '2021-01-01 00:00:00'

...which to my understanding, in most other DBs, is semantically equivalent to:
> select * from some_table where timestamp_col >= cast('2021-01-01 00:00:00' as 
> timestamp)

...as said DBs use target-typing for the string literal scalar. In DataFusion 
currently, the former approach will generate an error along the lines of 
"'Timestamp(Nanosecond, None) >= Utf8' can't be evaluated because there isn't a 
common type to coerce the types to" which makes sense. The latter approach also 
doesn't work, with an error of "Unsupported CAST from Utf8 to 
Timestamp(Nanosecond, None)", so I'm currently using DataFusion's unary 
to_timestamp function as a workaround.

My current (but relatively uninformed!) thinking is that we could add this 
transformation to the logical plan phase, and use available schema info to 
coerce string values to the correct type where required. However, the SQL AST 
=> logical plan step isn't one that I'm super familiar with, so if anyone has 
better suggestions of where to start, that'd be much appreciated. Otherwise, if 
there's broad approval for this approach, I can file this in JIRA and start 
work on it.

Cheers,
Ruan

Reply via email to