alamb opened a new issue, #14987:
URL: https://github.com/apache/datafusion/issues/14987

   ### Is your feature request related to a problem or challenge?
   
   As described by @ion-elgreco in 
https://github.com/apache/datafusion/issues/14944
   
   Given a dataset with an `Int64` column named `month`, when a predicate such 
as the following is created
   
   ```sql
   month_id = '202502'
   ```
   
   In sql / dataframe queries, this will be simplified to the following. Note 
the constant is cast and the column is not cast:
   
   ```sql
   month_id = cast('202502', 'Int64')
   ```
   
   However, when using 
[`SessionContext::create_physical_expr`](https://github.com/apache/datafusion/blob/5e49094c159ce110bebd2bb6f4858ff515cd1860/datafusion-examples/examples/expr_api.rs#L540-L543)
 to create a physical expression directly, as is done in delta.rs and other 
systems like LanceDB, the expression looks like this (cast on the column)
   
   ```sql
   cast(month_id, 'Int64') = '202502'
   ```
   
   This is bad for two reasons:
   1. `PruningPredicate` can't handle this type of expression (and thus it 
can't be used to prune Parquet row groups)
   2. Evaluating this filter is substantially slower as it has to apply a 
transformation to *all* values of `month_id` before it can evaluate the filter. 
And furthermore it does slow string comparison compared to faster int63 
comparison
   
   
   The reason this happens is that the conversion from `cast(month_id, 'Int64') 
= '202502'` to `month_id = Cast('202502', Int64)` happens in the Analyzer, 
specifically here: 
https://github.com/apache/datafusion/blob/2fcab2ef0da474ec000d7410427b9d18afb5820b/datafusion/optimizer/src/unwrap_cast_in_comparison.rs#L39-L77
   
   However, this pass is not run as part of 
`SessionContext::create_physical_expr`
   
   ### Describe the solution you'd like
   
   I would like the expressions crated by 
`SessionContext::create_physical_expr` to have had their casts unwrapped as well
   
   ### Describe alternatives you've considered
   
   THe ideal solution in my mind is to remove the entire Analyzer pass and 
instead do the unwrap in comparisons as part of the expression simplification 
   
   
https://docs.rs/datafusion/latest/datafusion/optimizer/simplify_expressions/expr_simplifier/struct.ExprSimplifier.html
   
   
   
   ### Additional context
   
   - https://github.com/delta-io/delta-rs/issues/3278
   - https://github.com/apache/datafusion/issues/14944


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to