westonpace commented on code in PR #14194:
URL: https://github.com/apache/datafusion/pull/14194#discussion_r1972236021


##########
datafusion/substrait/src/logical_plan/producer.rs:
##########
@@ -559,12 +559,31 @@ pub fn from_table_scan(
     let table_schema = scan.source.schema().to_dfschema_ref()?;
     let base_schema = to_substrait_named_struct(&table_schema)?;
 
+    let best_effort_filter_option = if !scan.filters.is_empty() {
+        let table_schema_qualified = Arc::new(
+            DFSchema::try_from_qualified_schema(
+                scan.table_name.clone(),
+                &(scan.source.schema()),
+            )
+            .unwrap(),
+        );
+        let mut combined_expr = scan.filters[0].clone();
+        for i in 1..scan.filters.len() {
+            combined_expr = combined_expr.and(scan.filters[i].clone());
+        }
+        let best_effort_filter_expr =
+            producer.handle_expr(&combined_expr, &table_schema_qualified)?;
+        Some(Box::new(best_effort_filter_expr))
+    } else {
+        None
+    };
+
     Ok(Box::new(Rel {
         rel_type: Some(RelType::Read(Box::new(ReadRel {
             common: None,
             base_schema: Some(base_schema),
             filter: None,
-            best_effort_filter: None,
+            best_effort_filter: best_effort_filter_option,

Review Comment:
   > I don't think there is a 1-1 mapping between a `TableScan` and the 
`ReadRel` with filter and best_effort_filter fields.
   
   I agree.  I think `TableScan`, as a logical node, is stating that all 
`filters` MUST be applied.  The equivalent logical plan in Substrait is a 
`ReadRel` with `filter`.
   
   Datafusion has no _logical_ equivalent to `ReadRel` with 
`best_effort_filter` and should ignore that field entirely (it is always safe 
to ignore `best_effort_filter`).  In other words, if I am, for example, using 
the dataframe API, there is no way I can say "do your best to filter on this 
and I'll take what I can get".
   
   Datafusion does have a (somewhat implicit) physical equivalent.  The 
physical plan created by an inexact table provider is equivalent to `ReadRel` 
with `best_effort_filter`.  However, there is no mapping between Substrait and 
physical plans today.
   
   If `best_effort_filter` compatibility is desired then I propose a change be 
made to Datafusion to add the best_effort_filter to `ReadRel`.
   
   I also don't see a strong need for `best_effort_filter`.  Datafusion does 
not need Substrait to tell it which filters should be applied on a best-effort 
basis and it can ignore that advice entirely.  It can figure that information 
out itself.  Even if a Substrait plan only has `filter` then Datafusion is free 
to use best-effort filtering in the resulting physical plan based on its own 
calculations from the `TableProvider`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to