nseekhao commented on code in PR #14194:
URL: https://github.com/apache/datafusion/pull/14194#discussion_r1975616391


##########
datafusion/substrait/src/logical_plan/producer.rs:
##########
@@ -559,12 +559,31 @@ pub fn from_table_scan(
     let table_schema = scan.source.schema().to_dfschema_ref()?;
     let base_schema = to_substrait_named_struct(&table_schema)?;
 
+    let best_effort_filter_option = if !scan.filters.is_empty() {
+        let table_schema_qualified = Arc::new(
+            DFSchema::try_from_qualified_schema(
+                scan.table_name.clone(),
+                &(scan.source.schema()),
+            )
+            .unwrap(),
+        );
+        let mut combined_expr = scan.filters[0].clone();
+        for i in 1..scan.filters.len() {
+            combined_expr = combined_expr.and(scan.filters[i].clone());
+        }
+        let best_effort_filter_expr =
+            producer.handle_expr(&combined_expr, &table_schema_qualified)?;
+        Some(Box::new(best_effort_filter_expr))
+    } else {
+        None
+    };
+
     Ok(Box::new(Rel {
         rel_type: Some(RelType::Read(Box::new(ReadRel {
             common: None,
             base_schema: Some(base_schema),
             filter: None,
-            best_effort_filter: None,
+            best_effort_filter: best_effort_filter_option,

Review Comment:
   Thanks @westonpace ! Makes sense, but let me make sure my understanding is 
correct.
   
   So let's say we have
   ```sql
   select c2 from t1 where c1 > 0;
   ```
   The logical plan will look something like:
   ```
   Projection:
     expr: c1
     Filter:
       predicate: (c1 > 0)
       TableScan:
         table_name: t1
         filters: None
         ...
   ```
   During logical plan optimization, the predicate from `Filter` will get 
pushed down to `TableScan`. Let's say data is partitioned and metadata contains 
ranges of `c1`. Since we could have partitions that contain both positive and 
negative `c1` values, my understanding is that the filter will be `Inexact` to 
make sure the physical plan knows to keep the original `FilterExec`. So the 
logical plan will look something like:
   ```
   Projection:
     expr: c1
     Filter:
       predicate: (c1 > 0)
       TableScan:
         table_name: t1
         filters: [(c1>0, Inexact)]
         ...
   ```
   So the logical `TableScan` is saying:
   * "The filter `c1 > 0` has to be applied" -- communicated via 
`TableScan#filters` containing `c1 > 0` expression
   * "But I cannot be sure that I won't get in extra rows due to how the data 
is partitioned" --  communicated via `Inexact`
   * "So please make sure the filter relation that asked for this condition 
stays when we translate this logical plan to physical plan. To ensure the 
correct results" --  communicated via `Inexact`
   
   Is this a correct understanding?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to