alamb commented on issue #14993:
URL: https://github.com/apache/datafusion/issues/14993#issuecomment-2702084204

   I am reopening this ticket as I think it covers serveral important usecases 
(that are all subsets of @adriangb 's example of `expensive_thing(col1)` above
   
   * `EXTRACT (minute from "EventDate")`. For example,  @gatesn mentions that 
the [Vortex](https://github.com/spiraldb/vortex) format may be able to evaluate 
this more quickly on the compressed format than extracting the full expression
   * `struct_column["field_name"]`: For example, extracting one field from a 
struct column -- in this case we could potentially update the json or parquet 
decoders to avoid materializing other fields (we would likely need more 
arrow-rs support too)
   
   So a query might be
   ```sql
   select EXTRACT (minute from "EventDate"),  SUM(something) 
   FROM hits 
   GROUP BY EXTRACT (minute from "EventDate");
   ```
   
   Being able to evlauate the `EXTRACT (minute from "EventDate")` expression 
*during* the scan would be super helpful
   
   One possibility here might be add an API to TableProvider similar to 
[`TableProvider::supports_filters_pushdown`](https://docs.rs/datafusion/latest/datafusion/catalog/trait.TableProvider.html#method.supports_filters_pushdown)
   
   maybe something like
   
   ```rust
   /// Returns true for all `Expr` in `expr` that can be directly evaluated by 
the TableProvider
   fn supports_expr_pushdown(
       &self,
       expr: &[&Expr],
   ) -> Result<Vec<bool>, DataFusionError>
   ```
   
   This information would have to be threaded through to `TableProvider::scan` 
as well
   
   (maybe it would be time to make `TableProvider::scan_with_args` 🤔 )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to