Blizzara commented on code in PR #14194:
URL: https://github.com/apache/datafusion/pull/14194#discussion_r1925612355
##########
datafusion/substrait/src/logical_plan/producer.rs:
##########
@@ -559,12 +559,31 @@ pub fn from_table_scan(
let table_schema = scan.source.schema().to_dfschema_ref()?;
let base_schema = to_substrait_named_struct(&table_schema)?;
+ let best_effort_filter_option = if !scan.filters.is_empty() {
+ let table_schema_qualified = Arc::new(
+ DFSchema::try_from_qualified_schema(
+ scan.table_name.clone(),
+ &(scan.source.schema()),
+ )
+ .unwrap(),
+ );
+ let mut combined_expr = scan.filters[0].clone();
+ for i in 1..scan.filters.len() {
+ combined_expr = combined_expr.and(scan.filters[i].clone());
+ }
+ let best_effort_filter_expr =
+ producer.handle_expr(&combined_expr, &table_schema_qualified)?;
+ Some(Box::new(best_effort_filter_expr))
+ } else {
+ None
+ };
+
Ok(Box::new(Rel {
rel_type: Some(RelType::Read(Box::new(ReadRel {
common: None,
base_schema: Some(base_schema),
filter: None,
- best_effort_filter: None,
+ best_effort_filter: best_effort_filter_option,
Review Comment:
Hm, from reading the Substrait plan it sounds like the "best effort" filter
would be something that the read node _can_ drop rows based on but doesn't
necessarily _have to_. So having "col1 < 5" as best-effort filter would say
that any rows where that doesn't match can be dropped by the read, but it's
okay if some of those pass. Then the read could do something like read parquet
footer, if it sees "min of col1 = 6", it could skip that whole file, but if it
sees "min of col1 = 2, max = 700", it could include the full file.
As compared to the "filter", which presumably should not let any
non-fulfilling rows pass.
The docstring for `scan.filters` says "/// Optional expressions to be used
as filters by the table provider", so not sure which one that falls under?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]