adriangb commented on PR #15057:
URL: https://github.com/apache/datafusion/pull/15057#issuecomment-2800002196

   I would like to resume this work.
   
   Some thoughts should the rewrite happen via a new trait as I'm currently 
doing, or should we add a method `PhysicalExpr::with_schema`?
    If we add `with_schema` what schema do we pass it? The actual file schema? 
There's something to be said for that: it could rewrite filters to case the 
literals / filters instead of casting the columns/arrays [as is currently 
done](https://github.com/pydantic/datafusion/blob/0b01fdf7f02f9097c319204058576f420b9790b4/datafusion/datasource-parquet/src/row_filter.rs#L146),
 which should be cheaper. I expect that any time it was okay to cast the data 
it was also okay to cast the predicate itself. It could also absorb the work of 
[reassign_predicate_columns](https://github.com/pydantic/datafusion/blob/0b01fdf7f02f9097c319204058576f420b9790b4/datafusion/datasource-parquet/src/row_filter.rs#L123)
 (we implement it for `Column` such that if it's index doesn't match but 
another one does it swaps).
   
   I suspect the hard bit with this approach will be edge cases: what if a 
filter _cannot_ adapt itself to the file schema, but we  could cast the column 
to make it work? I'm thinking something like a UDF that only accepts `Utf8` but 
the the file produces `Utf8View` 🤔 
   
   
   I think @jayzhan-synnada proposed something similar in 
https://github.com/apache/datafusion/pull/15685/files#diff-2b3f5563d9441d3303b57e58e804ab07a10d198973eed20e7751b5a20b955e42.
   
   @alamb any thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to