joroKr21 opened a new issue, #21642:
URL: https://github.com/apache/datafusion/issues/21642
### Describe the bug
The `PushDownFilter` logical optimizer rule has `ApplyOrder::TopDown` and it
claims to preserve filter oder:
"use IndexSet to remove dupes while preserving predicate order" but this
code is actually reversing the order of multiple filter plan nodes (as opposed
to a single filter with a conjunction expression):
```rust
let parents_predicates =
split_conjunction_owned(filter.predicate);
// remove duplicated filters
let child_predicates =
split_conjunction_owned(child_filter.predicate);
let new_predicates = parents_predicates
.into_iter()
.chain(child_predicates)
// use IndexSet to remove dupes while preserving
predicate order
.collect::<IndexSet<_>>()
.into_iter()
.collect::<Vec<_>>();
```
Note that the execution order of the unoptimized plan would be - child
filters first (as inner nodes are the input of outer nodes) and then parent
filters. But the optimized filter predicates are parent filters first, then
child filters.
### To Reproduce
If I make a small test change:
```diff
diff --git a/datafusion/optimizer/src/push_down_filter.rs
b/datafusion/optimizer/src/push_down_filter.rs
index a1a636cfef..1f8dbc91ce 100644
--- a/datafusion/optimizer/src/push_down_filter.rs
+++ b/datafusion/optimizer/src/push_down_filter.rs
@@ -3239,7 +3239,8 @@ mod tests {
vec![col("a").eq(lit(10i64)), col("b").gt(lit(11i64))],
Some(vec![0]),
)?
- .filter(and(col("a").eq(lit(10i64)), col("b").gt(lit(11i64))))?
+ .filter(col("a").eq(lit(10i64)))?
+ .filter(col("b").gt(lit(11i64)))?
.project(vec![col("a"), col("b")])?
.build()?;
```
the predicates are reversed:
```
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Snapshot Summary
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Snapshot: multi_combined_filter
Source: datafusion/optimizer/src/push_down_filter.rs:3247
────────────────────────────────────────────────────────────────────────────────
Expression: optimized_plan
────────────────────────────────────────────────────────────────────────────────
-old snapshot
+new results
────────────┬───────────────────────────────────────────────────────────────────
1 1 │ Projection: a, b
2 │- Filter: a = Int64(10) AND b > Int64(11)
2 │+ Filter: b > Int64(11) AND a = Int64(10)
3 3 │ TableScan: test projection=[a], partial_filters=[a =
Int64(10), b > Int64(11)]
────────────┴───────────────────────────────────────────────────────────────────
```
### Expected behavior
I expect `PushDownFilter` to preserve the execution order of predicates as
in the unoptimized plan.
### Additional context
In environments where we don't have information about the filter
selectivity, the user might have domain knowledge and put the more selective
filters first, expecting a boost in query performance. But counterintuitively,
using two filters instead of one filter with a conjunction, reverses the
execution order.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]