nseekhao commented on code in PR #14194: URL: https://github.com/apache/datafusion/pull/14194#discussion_r1975616391
########## datafusion/substrait/src/logical_plan/producer.rs: ########## @@ -559,12 +559,31 @@ pub fn from_table_scan( let table_schema = scan.source.schema().to_dfschema_ref()?; let base_schema = to_substrait_named_struct(&table_schema)?; + let best_effort_filter_option = if !scan.filters.is_empty() { + let table_schema_qualified = Arc::new( + DFSchema::try_from_qualified_schema( + scan.table_name.clone(), + &(scan.source.schema()), + ) + .unwrap(), + ); + let mut combined_expr = scan.filters[0].clone(); + for i in 1..scan.filters.len() { + combined_expr = combined_expr.and(scan.filters[i].clone()); + } + let best_effort_filter_expr = + producer.handle_expr(&combined_expr, &table_schema_qualified)?; + Some(Box::new(best_effort_filter_expr)) + } else { + None + }; + Ok(Box::new(Rel { rel_type: Some(RelType::Read(Box::new(ReadRel { common: None, base_schema: Some(base_schema), filter: None, - best_effort_filter: None, + best_effort_filter: best_effort_filter_option, Review Comment: Thanks @westonpace ! Makes sense, but let me make sure my understanding is correct. So let's say we have ```sql select c2 from t1 where c1 > 0; ``` The logical plan will look something like: ``` Projection: expr: c1 Filter: predicate: (c1 > 0) TableScan: table_name: t1 filters: None ... ``` During logical plan optimization, the predicate from `Filter` will get pushed down to `TableScan`. Let's say data is partitioned and metadata contains ranges of `c1`. Since we could have partitions that contain both positive and negative `c1` values, my understanding is that the filter will be `Inexact` to make sure the physical plan knows to keep the original `FilterExec`. So the logical plan will look something like: ``` Projection: expr: c1 Filter: predicate: (c1 > 0) TableScan: table_name: t1 filters: [(c1>0, Inexact)] ... ``` So the logical `TableScan` is saying: * "The filter `c1 > 0` has to be applied" -- communicated via `TableScan#filters` containing `c1 > 0` expression * "But I cannot be sure that I won't get in extra rows due to how the data is partitioned" -- communicated via `Inexact` * "So please make sure the filter relation that asked for this condition stays when we translate this logical plan to physical plan. To ensure the correct results" -- communicated via `Inexact` Is this a correct understanding? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org