adriangb commented on code in PR #17273: URL: https://github.com/apache/datafusion/pull/17273#discussion_r2292053325
########## datafusion/catalog/src/memory/table.rs: ########## @@ -138,6 +138,7 @@ impl MemTable { ) -> Result<Self> { let schema = t.schema(); let constraints = t.constraints(); + #[expect(deprecated)] Review Comment: We could just not deprecate the existing method to (1) reduce the diff here and (2) make sure we want to commit to the new method before deprecating the old one ########## datafusion/catalog/src/table.rs: ########## @@ -171,6 +173,35 @@ pub trait TableProvider: Debug + Sync + Send { limit: Option<usize>, ) -> Result<Arc<dyn ExecutionPlan>>; + async fn scan_with_args( + &self, + state: &dyn Session, + args: ScanArgs, + ) -> Result<ScanResult> { + let ScanArgs { + preferred_ordering: _, + filters, + projection, + limit, + } = args; + let filters = filters.unwrap_or_default(); + let unsupported_filters = self + .supports_filters_pushdown(&filters.iter().collect_vec())? Review Comment: I'm tempted to also do something about `supports_filters_pushdown`. I think it could be folded into `scan()` (and somewhat already set it up that way since `scan_with_args` gets to return unsupported/inexact filters). I would rework the filter pushdown optimizer so that it pushes all filters into the TableScan structure (without caring about supported / unsupported) and then then when we go from logical -> physical plan we apply a FilterExec for any non-exact filters and a ProjectionExec for the projections. ########## datafusion/core/src/datasource/listing/table.rs: ########## @@ -1195,21 +1211,36 @@ impl TableProvider for ListingTable { // if no files need to be read, return an `EmptyExec` if partitioned_file_lists.is_empty() { - let projected_schema = project_schema(&self.schema(), projection)?; - return Ok(Arc::new(EmptyExec::new(projected_schema))); + let projected_schema = project_schema(&self.schema(), projection.as_ref())?; + return Ok(ScanResult::new( + Arc::new(EmptyExec::new(projected_schema)), + filters.clone(), + )); } - let output_ordering = self.try_create_output_ordering()?; + let known_file_ordering = self.try_create_output_ordering()?; + let desired_file_ordering = match args.preferred_ordering() { + Some(ordering) if !ordering.is_empty() => { + // Prefer the ordering requested by the query to any inherint file ordering + create_ordering(&self.table_schema, &[ordering.to_vec()])? + .first() + .cloned() + } + Some(_) | None => { + // If the query did not request a specific ordering, fall back to any inherent file ordering + known_file_ordering.first().cloned() + } + }; Review Comment: Not too sure about correctness here. The `Vec<Vec<SortExpr>>` is a bit wonky. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org