adriangb commented on PR #15057: URL: https://github.com/apache/datafusion/pull/15057#issuecomment-2800002196
I would like to resume this work. Some thoughts should the rewrite happen via a new trait as I'm currently doing, or should we add a method `PhysicalExpr::with_schema`? If we add `with_schema` what schema do we pass it? The actual file schema? There's something to be said for that: it could rewrite filters to case the literals / filters instead of casting the columns/arrays [as is currently done](https://github.com/pydantic/datafusion/blob/0b01fdf7f02f9097c319204058576f420b9790b4/datafusion/datasource-parquet/src/row_filter.rs#L146), which should be cheaper. I expect that any time it was okay to cast the data it was also okay to cast the predicate itself. It could also absorb the work of [reassign_predicate_columns](https://github.com/pydantic/datafusion/blob/0b01fdf7f02f9097c319204058576f420b9790b4/datafusion/datasource-parquet/src/row_filter.rs#L123) (we implement it for `Column` such that if it's index doesn't match but another one does it swaps). I suspect the hard bit with this approach will be edge cases: what if a filter _cannot_ adapt itself to the file schema, but we could cast the column to make it work? I'm thinking something like a UDF that only accepts `Utf8` but the the file produces `Utf8View` 🤔 I think @jayzhan-synnada proposed something similar in https://github.com/apache/datafusion/pull/15685/files#diff-2b3f5563d9441d3303b57e58e804ab07a10d198973eed20e7751b5a20b955e42. @alamb any thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org