adriangb opened a new issue, #18326:
URL: https://github.com/apache/datafusion/issues/18326

   ### Is your feature request related to a problem or challenge?
   
   ```sql
   COPY (SELECT '2025-10-25T00:15:00Z'::timestamptz AS ts) TO 'ts.parquet';
   CREATE EXTERNAL TABLE t STORED AS PARQUET LOCATION 'ts.parquet';
   EXPLAIN
   SELECT * FROM t WHERE ts = '1761630189642';
   ```
   
   Results in
   
   ```
   +---------------+-------------------------------+
   | plan_type     | plan                          |
   +---------------+-------------------------------+
   | physical_plan | ┌───────────────────────────┐ |
   |               | │    CoalesceBatchesExec    │ |
   |               | │    --------------------   │ |
   |               | │     target_batch_size:    │ |
   |               | │            8192           │ |
   |               | └─────────────┬─────────────┘ |
   |               | ┌─────────────┴─────────────┐ |
   |               | │         FilterExec        │ |
   |               | │    --------------------   │ |
   |               | │         predicate:        │ |
   |               | │ ts = CAST(1761630189642 AS│ |
   |               | │    Timestamp(Nanosecond,  │ |
   |               | │      Some("+00:00")))     │ |
   |               | └─────────────┬─────────────┘ |
   |               | ┌─────────────┴─────────────┐ |
   |               | │      RepartitionExec      │ |
   |               | │    --------------------   │ |
   |               | │ partition_count(in->out): │ |
   |               | │          1 -> 12          │ |
   |               | │                           │ |
   |               | │    partitioning_scheme:   │ |
   |               | │    RoundRobinBatch(12)    │ |
   |               | └─────────────┬─────────────┘ |
   |               | ┌─────────────┴─────────────┐ |
   |               | │       DataSourceExec      │ |
   |               | │    --------------------   │ |
   |               | │          files: 1         │ |
   |               | │      format: parquet      │ |
   |               | │                           │ |
   |               | │         predicate:        │ |
   |               | │ ts = CAST(1761630189642 AS│ |
   |               | │    Timestamp(Nanosecond,  │ |
   |               | │      Some("+00:00")))     │ |
   |               | └───────────────────────────┘ |
   |               |                               |
   +---------------+-------------------------------+
   ```
   
   Note that we keep the cast all the way down into the physical plan.
   
   This makes it harder to catch and surface errors. For example if you remove 
the `EXPLAIN` on this query you get:
   
   ```
   Arrow error: Parser error: Error parsing timestamp from '1761630189642': 
error parsing date
   ```
   
   This is actually an error that happens inside of 
`ParquetOpener(ArrowRowFilter(ParquetRowFilter(...)))`. If we applied casts to 
literals during logical plan optimization we would:
   1. Catch the error earlier and within DataFusion itself before we start 
calling into `arrow-rs` and it calls us back.
   2. Possibly be more efficient (I guess the cast gets evaluated for each 
batch?).
   
   ### Describe the solution you'd like
   
   Resolve literal casts during logical plan optimizations.
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to