alamb commented on issue #14993: URL: https://github.com/apache/datafusion/issues/14993#issuecomment-2702084204
I am reopening this ticket as I think it covers serveral important usecases (that are all subsets of @adriangb 's example of `expensive_thing(col1)` above * `EXTRACT (minute from "EventDate")`. For example, @gatesn mentions that the [Vortex](https://github.com/spiraldb/vortex) format may be able to evaluate this more quickly on the compressed format than extracting the full expression * `struct_column["field_name"]`: For example, extracting one field from a struct column -- in this case we could potentially update the json or parquet decoders to avoid materializing other fields (we would likely need more arrow-rs support too) So a query might be ```sql select EXTRACT (minute from "EventDate"), SUM(something) FROM hits GROUP BY EXTRACT (minute from "EventDate"); ``` Being able to evlauate the `EXTRACT (minute from "EventDate")` expression *during* the scan would be super helpful One possibility here might be add an API to TableProvider similar to [`TableProvider::supports_filters_pushdown`](https://docs.rs/datafusion/latest/datafusion/catalog/trait.TableProvider.html#method.supports_filters_pushdown) maybe something like ```rust /// Returns true for all `Expr` in `expr` that can be directly evaluated by the TableProvider fn supports_expr_pushdown( &self, expr: &[&Expr], ) -> Result<Vec<bool>, DataFusionError> ``` This information would have to be threaded through to `TableProvider::scan` as well (maybe it would be time to make `TableProvider::scan_with_args` 🤔 ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org