westonpace commented on code in PR #14194: URL: https://github.com/apache/datafusion/pull/14194#discussion_r1972236021
########## datafusion/substrait/src/logical_plan/producer.rs: ########## @@ -559,12 +559,31 @@ pub fn from_table_scan( let table_schema = scan.source.schema().to_dfschema_ref()?; let base_schema = to_substrait_named_struct(&table_schema)?; + let best_effort_filter_option = if !scan.filters.is_empty() { + let table_schema_qualified = Arc::new( + DFSchema::try_from_qualified_schema( + scan.table_name.clone(), + &(scan.source.schema()), + ) + .unwrap(), + ); + let mut combined_expr = scan.filters[0].clone(); + for i in 1..scan.filters.len() { + combined_expr = combined_expr.and(scan.filters[i].clone()); + } + let best_effort_filter_expr = + producer.handle_expr(&combined_expr, &table_schema_qualified)?; + Some(Box::new(best_effort_filter_expr)) + } else { + None + }; + Ok(Box::new(Rel { rel_type: Some(RelType::Read(Box::new(ReadRel { common: None, base_schema: Some(base_schema), filter: None, - best_effort_filter: None, + best_effort_filter: best_effort_filter_option, Review Comment: > I don't think there is a 1-1 mapping between a `TableScan` and the `ReadRel` with filter and best_effort_filter fields. I agree. I think `TableScan`, as a logical node, is stating that all `filters` MUST be applied. The equivalent logical plan in Substrait is a `ReadRel` with `filter`. Datafusion has no _logical_ equivalent to `ReadRel` with `best_effort_filter` and should ignore that field entirely (it is always safe to ignore `best_effort_filter`). In other words, if I am, for example, using the dataframe API, there is no way I can say "do your best to filter on this and I'll take what I can get". Datafusion does have a (somewhat implicit) physical equivalent. The physical plan created by an inexact table provider is equivalent to `ReadRel` with `best_effort_filter`. However, there is no mapping between Substrait and physical plans today. If `best_effort_filter` compatibility is desired then I propose a change be made to Datafusion to add the best_effort_filter to `ReadRel`. I also don't see a strong need for `best_effort_filter`. Datafusion does not need Substrait to tell it which filters should be applied on a best-effort basis and it can ignore that advice entirely. It can figure that information out itself. Even if a Substrait plan only has `filter` then Datafusion is free to use best-effort filtering in the resulting physical plan based on its own calculations from the `TableProvider`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org