joroKr21 opened a new issue, #21642:
URL: https://github.com/apache/datafusion/issues/21642

   ### Describe the bug
   
   The `PushDownFilter` logical optimizer rule has `ApplyOrder::TopDown` and it 
claims to preserve filter oder:
   "use IndexSet to remove dupes while preserving predicate order" but this 
code is actually reversing the order of multiple filter plan nodes (as opposed 
to a single filter with a conjunction expression):
   
   ```rust
                   let parents_predicates = 
split_conjunction_owned(filter.predicate);
   
                   // remove duplicated filters
                   let child_predicates = 
split_conjunction_owned(child_filter.predicate);
                   let new_predicates = parents_predicates
                       .into_iter()
                       .chain(child_predicates)
                       // use IndexSet to remove dupes while preserving 
predicate order
                       .collect::<IndexSet<_>>()
                       .into_iter()
                       .collect::<Vec<_>>();
   ```
   
   Note that the execution order of the unoptimized plan would be - child 
filters first (as inner nodes are the input of outer nodes) and then parent 
filters. But the optimized filter predicates are parent filters first, then 
child filters.
   
   ### To Reproduce
   
   If I make a small test change:
   ```diff
   diff --git a/datafusion/optimizer/src/push_down_filter.rs 
b/datafusion/optimizer/src/push_down_filter.rs
   index a1a636cfef..1f8dbc91ce 100644
   --- a/datafusion/optimizer/src/push_down_filter.rs
   +++ b/datafusion/optimizer/src/push_down_filter.rs
   @@ -3239,7 +3239,8 @@ mod tests {
                vec![col("a").eq(lit(10i64)), col("b").gt(lit(11i64))],
                Some(vec![0]),
            )?
   -        .filter(and(col("a").eq(lit(10i64)), col("b").gt(lit(11i64))))?
   +        .filter(col("a").eq(lit(10i64)))?
   +        .filter(col("b").gt(lit(11i64)))?
            .project(vec![col("a"), col("b")])?
            .build()?;
   ```
   
   the predicates are reversed:
   ```
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Snapshot Summary 
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   Snapshot: multi_combined_filter
   Source: datafusion/optimizer/src/push_down_filter.rs:3247
   
────────────────────────────────────────────────────────────────────────────────
   Expression: optimized_plan
   
────────────────────────────────────────────────────────────────────────────────
   -old snapshot
   +new results
   
────────────┬───────────────────────────────────────────────────────────────────
       1     1 │ Projection: a, b
       2       │-  Filter: a = Int64(10) AND b > Int64(11)
             2 │+  Filter: b > Int64(11) AND a = Int64(10)
       3     3 │     TableScan: test projection=[a], partial_filters=[a = 
Int64(10), b > Int64(11)]
   
────────────┴───────────────────────────────────────────────────────────────────
   ```
   
   ### Expected behavior
   
   I expect `PushDownFilter` to preserve the execution order of predicates as 
in the unoptimized plan.
   
   ### Additional context
   
   In environments where we don't have information about the filter 
selectivity, the user might have domain knowledge and put the more selective 
filters first, expecting a boost in query performance. But counterintuitively, 
using two filters instead of one filter with a conjunction, reverses the 
execution order.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to