SAY-5 opened a new pull request, #21646:
URL: https://github.com/apache/datafusion/pull/21646

   Closes #21642.
   
   When `PushDownFilter` collapses a parent `Filter` into a child `Filter`, the 
combined predicate list was built as 
`parents_predicates.chain(child_predicates)`:
   
   ```rust
   LogicalPlan::Filter(child_filter) => {
       let parents_predicates = split_conjunction_owned(filter.predicate);
       let child_predicates = split_conjunction_owned(child_filter.predicate);
       let new_predicates = parents_predicates
           .into_iter()
           .chain(child_predicates)
           // use IndexSet to remove dupes while preserving predicate order
           .collect::<IndexSet<_>>()
           ...
   ```
   
   Because the unoptimized plan evaluates inner (child) filters before outer 
(parent) filters, this reversed the user-authored execution order of selective 
predicates. The reproducer in the issue:
   
   ```rust
   LogicalPlanBuilder::from(scan)
       .filter(col("a").eq(lit(10)))?    // applied first (inner)
       .filter(col("b").gt(lit(11)))?    // applied second (outer)
   ```
   
   …was optimized into `Filter: b > 11 AND a = 10` rather than the expected 
`Filter: a = 10 AND b > 11`. In environments without filter-selectivity 
statistics this matters: users who place a cheap or highly-selective filter 
first expect it to execute first.
   
   ### Fix
   
   Build the conjunction as `child_predicates.chain(parents_predicates)` so 
`PushDownFilter` actually preserves the execution order it claims to in the 
existing `use IndexSet to remove dupes while preserving predicate order` 
comment.
   
   ### Snapshot update
   
   `two_filters_on_same_depth` had its `assert_optimized_plan_equal!` snapshot 
frozen at the buggy order. The unoptimized plan in that test applies `a <= 1` 
first (inner filter) and `a >= 1` second (outer), so the optimized result must 
be `a <= 1 AND a >= 1`. Updated the snapshot and added an inline comment 
documenting the invariant.
   
   ### Verification
   
   `cargo test -p datafusion-optimizer --lib push_down_filter` passes 79/79 
tests with the fix and the snapshot update applied.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to