adriangb opened a new issue, #18326:
URL: https://github.com/apache/datafusion/issues/18326
### Is your feature request related to a problem or challenge?
```sql
COPY (SELECT '2025-10-25T00:15:00Z'::timestamptz AS ts) TO 'ts.parquet';
CREATE EXTERNAL TABLE t STORED AS PARQUET LOCATION 'ts.parquet';
EXPLAIN
SELECT * FROM t WHERE ts = '1761630189642';
```
Results in
```
+---------------+-------------------------------+
| plan_type | plan |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
| | │ CoalesceBatchesExec │ |
| | │ -------------------- │ |
| | │ target_batch_size: │ |
| | │ 8192 │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ FilterExec │ |
| | │ -------------------- │ |
| | │ predicate: │ |
| | │ ts = CAST(1761630189642 AS│ |
| | │ Timestamp(Nanosecond, │ |
| | │ Some("+00:00"))) │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ RepartitionExec │ |
| | │ -------------------- │ |
| | │ partition_count(in->out): │ |
| | │ 1 -> 12 │ |
| | │ │ |
| | │ partitioning_scheme: │ |
| | │ RoundRobinBatch(12) │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ DataSourceExec │ |
| | │ -------------------- │ |
| | │ files: 1 │ |
| | │ format: parquet │ |
| | │ │ |
| | │ predicate: │ |
| | │ ts = CAST(1761630189642 AS│ |
| | │ Timestamp(Nanosecond, │ |
| | │ Some("+00:00"))) │ |
| | └───────────────────────────┘ |
| | |
+---------------+-------------------------------+
```
Note that we keep the cast all the way down into the physical plan.
This makes it harder to catch and surface errors. For example if you remove
the `EXPLAIN` on this query you get:
```
Arrow error: Parser error: Error parsing timestamp from '1761630189642':
error parsing date
```
This is actually an error that happens inside of
`ParquetOpener(ArrowRowFilter(ParquetRowFilter(...)))`. If we applied casts to
literals during logical plan optimization we would:
1. Catch the error earlier and within DataFusion itself before we start
calling into `arrow-rs` and it calls us back.
2. Possibly be more efficient (I guess the cast gets evaluated for each
batch?).
### Describe the solution you'd like
Resolve literal casts during logical plan optimizations.
### Describe alternatives you've considered
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]