alamb commented on code in PR #17273:
URL: https://github.com/apache/datafusion/pull/17273#discussion_r2304261369


##########
datafusion/catalog/src/table.rs:
##########
@@ -299,6 +328,75 @@ pub trait TableProvider: Debug + Sync + Send {
     }
 }
 
+#[derive(Debug, Clone, Default)]
+pub struct ScanArgs {
+    preferred_ordering: Option<Vec<SortExpr>>,
+    filters: Option<Vec<Expr>>,
+    projection: Option<Vec<usize>>,
+    limit: Option<usize>,
+}
+
+impl ScanArgs {
+    pub fn with_projection(mut self, projection: Option<Vec<usize>>) -> Self {
+        self.projection = projection;
+        self
+    }
+
+    pub fn projection(&self) -> Option<Vec<usize>> {
+        self.projection.clone()
+    }
+
+    pub fn with_filters(mut self, filters: Option<Vec<Expr>>) -> Self {
+        self.filters = filters;
+        self
+    }
+
+    pub fn filters(&self) -> Option<&[Expr]> {
+        self.filters.as_deref()
+    }
+
+    pub fn with_limit(mut self, limit: Option<usize>) -> Self {
+        self.limit = limit;
+        self
+    }
+
+    pub fn limit(&self) -> Option<usize> {
+        self.limit
+    }
+
+    pub fn with_preferred_ordering(mut self, ordering: Option<Vec<SortExpr>>) 
-> Self {
+        self.preferred_ordering = ordering;
+        self
+    }
+
+    pub fn preferred_ordering(&self) -> Option<&[SortExpr]> {
+        self.preferred_ordering.as_deref()
+    }
+}
+
+#[derive(Debug, Clone)]
+pub struct ScanResult {
+    /// The ExecutionPlan to run.
+    plan: Arc<dyn ExecutionPlan>,
+    // Remaining filters that were not completely evaluated during 
`scan_with_args()`.
+    // These were previously referred to as "unsupported filters" or "inexact 
filters".
+    filters: Vec<Expr>,

Review Comment:
   I think this is confusing as the "can the filters be supported" is currently 
a separate API. I think we should have one or the other but not both. 
   
   I realize that `scan_with_args` is basically the "create the appropriate 
physical plan"  so logically it makes sense here.



##########
datafusion/catalog/src/table.rs:
##########
@@ -171,6 +173,35 @@ pub trait TableProvider: Debug + Sync + Send {
         limit: Option<usize>,
     ) -> Result<Arc<dyn ExecutionPlan>>;
 
+    async fn scan_with_args(
+        &self,
+        state: &dyn Session,
+        args: ScanArgs,
+    ) -> Result<ScanResult> {
+        let ScanArgs {
+            preferred_ordering: _,
+            filters,
+            projection,
+            limit,
+        } = args;
+        let filters = filters.unwrap_or_default();
+        let unsupported_filters = self
+            .supports_filters_pushdown(&filters.iter().collect_vec())?

Review Comment:
   I recommend doing this as a follow on PR -- and the key consideration would 
be how disruptive it would be to existing users vs the benefit existing users 
and new users would get



##########
datafusion/catalog/src/table.rs:
##########
@@ -171,6 +172,34 @@ pub trait TableProvider: Debug + Sync + Send {
         limit: Option<usize>,
     ) -> Result<Arc<dyn ExecutionPlan>>;
 
+    async fn scan_with_args(

Review Comment:
   I think this is a great new API and makes a lot of sense as it allows us to 
iterate / expand the scan API over time with less API disruption
   
   It is also consistent with 
[`ScalarUDFImpl::invoke_with_args`](https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html#tymethod.invoke_with_args)
   
   To merge this PR I think we should:
   1. document this function (perhaps move the docs from `scan` here and then 
direct people to use `scan_with_args` with new TableProviders
   2. Migrate all existing code in Datafusion to use `scan_with_args`
   3. Consider deprecating `scan` 
   
   (maybe we can do this as a follow on PR)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to