alamb commented on code in PR #17273: URL: https://github.com/apache/datafusion/pull/17273#discussion_r2304261369
########## datafusion/catalog/src/table.rs: ########## @@ -299,6 +328,75 @@ pub trait TableProvider: Debug + Sync + Send { } } +#[derive(Debug, Clone, Default)] +pub struct ScanArgs { + preferred_ordering: Option<Vec<SortExpr>>, + filters: Option<Vec<Expr>>, + projection: Option<Vec<usize>>, + limit: Option<usize>, +} + +impl ScanArgs { + pub fn with_projection(mut self, projection: Option<Vec<usize>>) -> Self { + self.projection = projection; + self + } + + pub fn projection(&self) -> Option<Vec<usize>> { + self.projection.clone() + } + + pub fn with_filters(mut self, filters: Option<Vec<Expr>>) -> Self { + self.filters = filters; + self + } + + pub fn filters(&self) -> Option<&[Expr]> { + self.filters.as_deref() + } + + pub fn with_limit(mut self, limit: Option<usize>) -> Self { + self.limit = limit; + self + } + + pub fn limit(&self) -> Option<usize> { + self.limit + } + + pub fn with_preferred_ordering(mut self, ordering: Option<Vec<SortExpr>>) -> Self { + self.preferred_ordering = ordering; + self + } + + pub fn preferred_ordering(&self) -> Option<&[SortExpr]> { + self.preferred_ordering.as_deref() + } +} + +#[derive(Debug, Clone)] +pub struct ScanResult { + /// The ExecutionPlan to run. + plan: Arc<dyn ExecutionPlan>, + // Remaining filters that were not completely evaluated during `scan_with_args()`. + // These were previously referred to as "unsupported filters" or "inexact filters". + filters: Vec<Expr>, Review Comment: I think this is confusing as the "can the filters be supported" is currently a separate API. I think we should have one or the other but not both. I realize that `scan_with_args` is basically the "create the appropriate physical plan" so logically it makes sense here. ########## datafusion/catalog/src/table.rs: ########## @@ -171,6 +173,35 @@ pub trait TableProvider: Debug + Sync + Send { limit: Option<usize>, ) -> Result<Arc<dyn ExecutionPlan>>; + async fn scan_with_args( + &self, + state: &dyn Session, + args: ScanArgs, + ) -> Result<ScanResult> { + let ScanArgs { + preferred_ordering: _, + filters, + projection, + limit, + } = args; + let filters = filters.unwrap_or_default(); + let unsupported_filters = self + .supports_filters_pushdown(&filters.iter().collect_vec())? Review Comment: I recommend doing this as a follow on PR -- and the key consideration would be how disruptive it would be to existing users vs the benefit existing users and new users would get ########## datafusion/catalog/src/table.rs: ########## @@ -171,6 +172,34 @@ pub trait TableProvider: Debug + Sync + Send { limit: Option<usize>, ) -> Result<Arc<dyn ExecutionPlan>>; + async fn scan_with_args( Review Comment: I think this is a great new API and makes a lot of sense as it allows us to iterate / expand the scan API over time with less API disruption It is also consistent with [`ScalarUDFImpl::invoke_with_args`](https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html#tymethod.invoke_with_args) To merge this PR I think we should: 1. document this function (perhaps move the docs from `scan` here and then direct people to use `scan_with_args` with new TableProviders 2. Migrate all existing code in Datafusion to use `scan_with_args` 3. Consider deprecating `scan` (maybe we can do this as a follow on PR) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org