adriangb commented on code in PR #17273:
URL: https://github.com/apache/datafusion/pull/17273#discussion_r2292053325


##########
datafusion/catalog/src/memory/table.rs:
##########
@@ -138,6 +138,7 @@ impl MemTable {
     ) -> Result<Self> {
         let schema = t.schema();
         let constraints = t.constraints();
+        #[expect(deprecated)]

Review Comment:
   We could just not deprecate the existing method to (1) reduce the diff here 
and (2) make sure we want to commit to the new method before deprecating the 
old one



##########
datafusion/catalog/src/table.rs:
##########
@@ -171,6 +173,35 @@ pub trait TableProvider: Debug + Sync + Send {
         limit: Option<usize>,
     ) -> Result<Arc<dyn ExecutionPlan>>;
 
+    async fn scan_with_args(
+        &self,
+        state: &dyn Session,
+        args: ScanArgs,
+    ) -> Result<ScanResult> {
+        let ScanArgs {
+            preferred_ordering: _,
+            filters,
+            projection,
+            limit,
+        } = args;
+        let filters = filters.unwrap_or_default();
+        let unsupported_filters = self
+            .supports_filters_pushdown(&filters.iter().collect_vec())?

Review Comment:
   I'm tempted to also do something about `supports_filters_pushdown`. I think 
it could be folded into `scan()` (and somewhat already set it up that way since 
`scan_with_args` gets to return unsupported/inexact filters).
   
   I would rework the filter pushdown optimizer so that it pushes all filters 
into the TableScan structure (without caring about supported / unsupported) and 
then then when we go from logical -> physical plan we apply a FilterExec for 
any non-exact filters and a ProjectionExec for the projections.



##########
datafusion/core/src/datasource/listing/table.rs:
##########
@@ -1195,21 +1211,36 @@ impl TableProvider for ListingTable {
 
         // if no files need to be read, return an `EmptyExec`
         if partitioned_file_lists.is_empty() {
-            let projected_schema = project_schema(&self.schema(), projection)?;
-            return Ok(Arc::new(EmptyExec::new(projected_schema)));
+            let projected_schema = project_schema(&self.schema(), 
projection.as_ref())?;
+            return Ok(ScanResult::new(
+                Arc::new(EmptyExec::new(projected_schema)),
+                filters.clone(),
+            ));
         }
 
-        let output_ordering = self.try_create_output_ordering()?;
+        let known_file_ordering = self.try_create_output_ordering()?;
+        let desired_file_ordering = match args.preferred_ordering() {
+            Some(ordering) if !ordering.is_empty() => {
+                // Prefer the ordering requested by the query to any inherint 
file ordering
+                create_ordering(&self.table_schema, &[ordering.to_vec()])?
+                    .first()
+                    .cloned()
+            }
+            Some(_) | None => {
+                // If the query did not request a specific ordering, fall back 
to any inherent file ordering
+                known_file_ordering.first().cloned()
+            }
+        };

Review Comment:
   Not too sure about correctness here. The `Vec<Vec<SortExpr>>` is a bit wonky.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to